Neural data search for table augmentation
Brinkmann, Alexander
URL:
|
https://ceur-ws.org/Vol-3379/PhDWorkshop_2023_brin...
|
URN:
|
urn:nbn:de:bsz:180-madoc-643986
|
Document Type:
|
Conference or workshop publication
|
Year of publication:
|
2023
|
Book title:
|
Proceedings of the Workshops of the EDBT/ICDT 2023 Joint Conference, Ioannina, Greece, March, 28, 2023
|
The title of a journal, publication series:
|
CEUR Workshop Proceedings
|
Volume:
|
3379
|
Page range:
|
1-4
|
Conference title:
|
EDBT/ICDT Workshops 2023
|
Location of the conference venue:
|
Ioannina, Greece
|
Date of the conference:
|
28.03.2023
|
Publisher:
|
Fletcher, George
;
Kantere, Verena
|
Place of publication:
|
Aachen, Germany
|
Publishing house:
|
RWTH Aachen
|
ISSN:
|
1613-0073
|
Related URLs:
|
|
Publication language:
|
English
|
Institution:
|
School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
|
Pre-existing license:
|
Creative Commons Attribution 4.0 International (CC BY 4.0)
|
Subject:
|
004 Computer science, internet
|
Abstract:
|
Tabular data is widely available on the web and in private data lakes run by commercial companies or research institutes. However, data that is essential for a specific task at hand is often scattered throughout numerous tables in these data lakes. Accessing this data requires retrieving the relevant information for the task. One approach to retrieve this data is through table augmentation. Table augmentation adds an additional attribute to a query table and populates the values of that attribute with data from the data lake. My research focuses on evaluating methods for augmenting a table with an additional attribute. Table augmentation presents a variety of challenges due to the heterogeneity of data sources and the multitude of possible combinations of methods. To successfully augment a query table based on tabular data from a data lake, several tasks such as data normalization, data search, schema matching, information extraction and data fusion must be performed. In my work, I empirically compare methods for data search, information extraction and data fusion as well as complete table augmentation pipelines using different datasets containing tabular data found in real-world data lakes. Methodologically, I plan to introduce new neural techniques for data search, information extraction and data fusion in the context of table augmentation. These new methods, as well as existing symbolic data search methods for table augmentation, will be empirically evaluated on two sets of benchmark query tables. The aim is to identify task- and dataset-specific challenges for data search, information extraction and data fusion methods. By profiling the datasets and analysing the errors made by the evaluated methods on the test query tables, the strengths and weaknesses of the methods can be systematically identified. Data search and information extraction methods should maximize recall while data fusion methods should achieve high accuracy. Pipelines built on the basis of the new methods should deliver their results quickly without compromising the highest possible accuracy of the augmented attribute values.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
| Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt. |
Search Authors in
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
|
Show item |
|
|