Web-scale web table to knowledge base matching
Ritze, Dominique
URL:
|
https://ub-madoc.bib.uni-mannheim.de/43123
|
URN:
|
urn:nbn:de:bsz:180-madoc-431233
|
Dokumenttyp:
|
Dissertation
|
Erscheinungsjahr:
|
2017
|
Ort der Veröffentlichung:
|
Mannheim
|
Hochschule:
|
Universität Mannheim
|
Gutachter:
|
Bizer, Christian
|
Datum der mündl. Prüfung:
|
6 November 2017
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
|
Fachgebiet:
|
004 Informatik
|
Normierte Schlagwörter (SWD):
|
Matching
|
Freie Schlagwörter (Englisch):
|
Web Table , Matching , Knowledge Base
|
Abstract:
|
Millions of relational HTML tables are found on the World Wide Web. In contrast
to unstructured text, relational web tables provide a compact representation of entities
described by attributes. The data within these tables covers a broad topical
range. Web table data is used for question answering, augmentation of search results,
and knowledge base completion. Until a few years ago, only search engines
companies like Google and Microsoft owned large web crawls from which web
tables are extracted. Thus, researches outside the companies have not been able to
work with web tables.
In this thesis, the first publicly available web table corpus containing millions of
web tables is introduced. The corpus enables interested researchers to experiment
with web tables. A profile of the corpus is created to give insights to the characteristics
and topics. Further, the potential of web tables for augmenting cross-domain
knowledge bases is investigated. For the use case of knowledge base augmentation,
it is necessary to understand the web table content. For this reason, web
tables are matched to a knowledge base. The matching comprises three matching
tasks: instance, property, and class matching. Existing web table to knowledge
base matching systems either focus on a subset of these matching tasks or are evaluated
using gold standards which also only cover a subset of the challenges that
arise when matching web tables to knowledge bases.
This thesis systematically evaluates the utility of a wide range of different features
for the web table to knowledge base matching task using a single gold standard.
The results of the evaluation are used afterwards to design a holistic matching
method which covers all matching tasks and outperforms state-of-the-art web table
to knowledge base matching systems. In order to achieve these goals, we first
propose the T2K Match algorithm which addresses all three matching tasks in an
integrated fashion. In addition, we introduce the T2D gold standard which covers
a wide variety of challenges. By evaluating T2K Match against the T2D gold standard,
we identify that only considering the table content is insufficient. Hence, we
include features of three categories: features found in the table, in the table context
like the page title, and features that base on external resources like a synonym dictionary.
We analyze the utility of the features for each matching task. The analysis
shows that certain problems cannot be overcome by matching each table in isolation
to the knowledge base. In addition, relying on the features is not enough for the
property matching task. Based on these findings, we extend T2K Match into T2K
Match++ which exploits indirect matches to web tables about the same topic and
uses knowledge derived from the knowledge base. We show that T2K Match++
outperforms all state-of-the-art web table to knowledge base matching approaches
on the T2D and Limaye gold standard. Most systems show good results on one
matching task but T2K Match++ is the only system that achieves F-measure scores
above 0:8 for all tasks. Compared to results of the best performing system TableMiner+,
the F-measure for the difficult property matching task is increased by 0.08,
for the class and instance matching task by 0.05 and 0.03, respectively.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
| Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt. |
Suche Autoren in
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|