Automatically curated data sets


Kessel, Marcus ; Atkinson, Colin


DOI: https://doi.org/10.1109/SCAM.2019.00010
URL: https://ieeexplore.ieee.org/document/8930881
Document Type: Conference or workshop publication
Year of publication: 2019
Book title: SCAM 2019 : 19th International Working Conference on Source Code Analysis and Manipulation, September 30 - October 1, 2019, Cleveland, Ohio : proceedings
Page range: 56-61
Conference title: 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)
Location of the conference venue: Cleveland, OH
Date of the conference: 30.09.-01.10.19
Author/Publisher of the book
(only the first ones mentioned)
:
O'Conner, Lisa
Place of publication: Los Alamitos, CA [u.a.]
Publishing house: IEEE
ISBN: 978-1-7281-4938-7 , 978-1-7281-4937-0
ISSN: 1942-5430 , 2470-6892
Publication language: English
Institution: School of Business Informatics and Mathematics > Softwaretechnik (Atkinson)
Subject: 004 Computer science, internet
Abstract: To validate hypotheses and tools that depend on the semantics of software, it is necessary to assemble, prepare and maintain (i.e. curate) large, high-quality corpora of executable software systems exhibiting certain desired behavior and/or properties. Today this is a highly tedious and laborious activity requiring significant human time and effort. In this paper we therefore present a prototype platform that supports the notion of “live data sets” where almost all aspects of the data set curation process are automated. Instead of curating data sets by hand, or writing dedicated tools to select and check software samples on a case-by-case basis, a live data set allows users to simply describe their requirements as abstract scripts written in a declarative domain specific language. After explaining the approach and the key ideas behind its implementation, in this paper we present two examples of executable corpora generated automatically from a live data set populated from Maven Central. The first illustrates a “semantics agnostic” use case where the actual behavior of the software is unimportant, while the second illustrates a “semantics specific” use case where software implementing a specific functional abstraction is selected.

Dieser Eintrag ist Teil der Universitätsbibliographie.




+ Citation Example and Export

Kessel, Marcus ; Atkinson, Colin Automatically curated data sets. O'Conner, Lisa 56-61 In: SCAM 2019 : 19th International Working Conference on Source Code Analysis and Manipulation, September 30 - October 1, 2019, Cleveland, Ohio : proceedings (2019) Los Alamitos, CA [u.a.] 19th International Working Conference on Source Code Analysis and Manipulation (SCAM) (Cleveland, OH) [Conference or workshop publication]


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item