Automatically curated data sets


Kessel, Marcus ; Atkinson, Colin



DOI: https://doi.org/10.1109/SCAM.2019.00010
URL: https://ieeexplore.ieee.org/document/8930881
Dokumenttyp: Konferenzveröffentlichung
Erscheinungsjahr: 2019
Buchtitel: SCAM 2019 : 19th International Working Conference on Source Code Analysis and Manipulation, September 30 - October 1, 2019, Cleveland, Ohio : proceedings
Seitenbereich: 56-61
Veranstaltungstitel: 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)
Veranstaltungsort: Cleveland, OH
Veranstaltungsdatum: 30.09.-01.10.19
Herausgeber: O'Conner, Lisa
Ort der Veröffentlichung: Los Alamitos, CA [u.a.]
Verlag: IEEE
ISBN: 978-1-7281-4938-7 , 978-1-7281-4937-0
ISSN: 1942-5430 , 2470-6892
Sprache der Veröffentlichung: Englisch
Einrichtung: Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Software Engineering (Atkinson 2003-)
Fachgebiet: 004 Informatik
Abstract: To validate hypotheses and tools that depend on the semantics of software, it is necessary to assemble, prepare and maintain (i.e. curate) large, high-quality corpora of executable software systems exhibiting certain desired behavior and/or properties. Today this is a highly tedious and laborious activity requiring significant human time and effort. In this paper we therefore present a prototype platform that supports the notion of “live data sets” where almost all aspects of the data set curation process are automated. Instead of curating data sets by hand, or writing dedicated tools to select and check software samples on a case-by-case basis, a live data set allows users to simply describe their requirements as abstract scripts written in a declarative domain specific language. After explaining the approach and the key ideas behind its implementation, in this paper we present two examples of executable corpora generated automatically from a live data set populated from Maven Central. The first illustrates a “semantics agnostic” use case where the actual behavior of the software is unimportant, while the second illustrates a “semantics specific” use case where software implementing a specific functional abstraction is selected.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadaten-Export


Zitation


+ Suche Autoren in

+ Aufruf-Statistik

Aufrufe im letzten Jahr

Detaillierte Angaben



Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail


Actions (login required)

Eintrag anzeigen Eintrag anzeigen