Density- and correlation-based table extension
Kleppmann, Benedikt
;
Bizer, Christian
;
Yaqub, Edwin
;
Temme, Fabian
;
Schlunder, Philipp
;
Arnu, David
;
Klinkenberg, Ralf

URL:
|
http://ceur-ws.org/Vol-2191/paper23.pdf
|
Additional URL:
|
http://ceur-ws.org/Vol-2191/
|
Document Type:
|
Conference or workshop publication
|
Year of publication:
|
2018
|
Book title:
|
LWDA 2018 : Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" Mannheim, Germany, August 22-24, 2018
|
The title of a journal, publication series:
|
CEUR Workshop Proceedings
|
Volume:
|
2191
|
Page range:
|
191-194
|
Conference title:
|
Lernen, Wissen, Daten, Analysen 2018
|
Location of the conference venue:
|
Mannheim, Germany
|
Date of the conference:
|
August 22-24, 2018
|
Publisher:
|
Gemulla, Rainer
|
Place of publication:
|
Aachen, Germany
|
Publishing house:
|
RWTH Aachen
|
ISSN:
|
1613-0073
|
Publication language:
|
English
|
Institution:
|
School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer)
|
Subject:
|
004 Computer science, internet
|
Keywords (English):
|
data discovery , table extension , holistic matching , web tables
|
Abstract:
|
With thousands of data sources available on the Web as well as within organizations, data scientists increasingly spend more time searching for data than analyzing it. In order to ease the task of finding relevant data for data mining projects, this paper presents two data discovery and data integration methods that have been developed in a joint research project by RapidMiner Research and the University of Mannheim. Given a corpus of relational tables, the methods extend a query table with additional attributes and automatically fill these new attributes with data values from the corpus. The first method, densitybased table extension, extends the query table with all attributes that can be filled with data values so that a user-specified density threshold is reached. The second method, correlation-based table extension, extends the query table with all attributes that correlate with a specific attribute of the query table. Both methods are integrated as operators into RapidMiner Studio, a popular data mining environment. This enables data scientists to search for data and apply a wide range of different mining methods to the discovered data within the same environment.
|
 | Dieser Eintrag ist Teil der Universitätsbibliographie. |
Search Authors in
BASE:
Kleppmann, Benedikt
;
Bizer, Christian
;
Yaqub, Edwin
;
Temme, Fabian
;
Schlunder, Philipp
;
Arnu, David
;
Klinkenberg, Ralf
Google Scholar:
Kleppmann, Benedikt
;
Bizer, Christian
;
Yaqub, Edwin
;
Temme, Fabian
;
Schlunder, Philipp
;
Arnu, David
;
Klinkenberg, Ralf
ORCID:
Kleppmann, Benedikt ; Bizer, Christian ORCID: 0000-0003-2367-0237 ; Yaqub, Edwin ; Temme, Fabian ; Schlunder, Philipp ; Arnu, David ; Klinkenberg, Ralf
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
 |
Show item |
|