Density- and correlation-based table extension


Kleppmann, Benedikt ; Bizer, Christian ; Yaqub, Edwin ; Temme, Fabian ; Schlunder, Philipp ; Arnu, David ; Klinkenberg, Ralf



URL: http://ceur-ws.org/Vol-2191/paper23.pdf
Additional URL: http://ceur-ws.org/Vol-2191/
Document Type: Conference or workshop publication
Year of publication: 2018
Book title: LWDA 2018 : Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" Mannheim, Germany, August 22-24, 2018
The title of a journal, publication series: CEUR Workshop Proceedings
Volume: 2191
Page range: 191-194
Conference title: Lernen, Wissen, Daten, Analysen 2018
Location of the conference venue: Mannheim, Germany
Date of the conference: August 22-24, 2018
Publisher: Gemulla, Rainer
Place of publication: Aachen, Germany
Publishing house: RWTH Aachen
ISSN: 1613-0073
Publication language: English
Institution: School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
Subject: 004 Computer science, internet
Keywords (English): data discovery , table extension , holistic matching , web tables
Abstract: With thousands of data sources available on the Web as well as within organizations, data scientists increasingly spend more time searching for data than analyzing it. In order to ease the task of finding relevant data for data mining projects, this paper presents two data discovery and data integration methods that have been developed in a joint research project by RapidMiner Research and the University of Mannheim. Given a corpus of relational tables, the methods extend a query table with additional attributes and automatically fill these new attributes with data values from the corpus. The first method, densitybased table extension, extends the query table with all attributes that can be filled with data values so that a user-specified density threshold is reached. The second method, correlation-based table extension, extends the query table with all attributes that correlate with a specific attribute of the query table. Both methods are integrated as operators into RapidMiner Studio, a popular data mining environment. This enables data scientists to search for data and apply a wide range of different mining methods to the discovered data within the same environment.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadata export


Citation


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item