Knowledge graph exploration for natural language understanding in web information retrieval
Schuhmacher, Michael
URL:
|
https://madoc.bib.uni-mannheim.de/41485
|
URN:
|
urn:nbn:de:bsz:180-madoc-414859
|
Dokumenttyp:
|
Dissertation
|
Erscheinungsjahr:
|
2016
|
Ort der Veröffentlichung:
|
Mannheim
|
Hochschule:
|
Universität Mannheim
|
Gutachter:
|
Ponzetto, Simone Paolo
|
Datum der mündl. Prüfung:
|
11 November 2016
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Semantic Web (Juniorprofessur) (Ponzetto 2013-2015) Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Practical Computer Science II: Artificial Intelligence (Stuckenschmidt 2009-)
|
Fachgebiet:
|
004 Informatik
|
Normierte Schlagwörter (SWD):
|
Information Retrieval , Semantic Web , Wissensrepräsentation
|
Freie Schlagwörter (Englisch):
|
Information Retrieval , Semantic Web , Knowledge Bases
|
Abstract:
|
In this thesis, we study methods to leverage information from fully-structured knowledge bases
(KBs), in particular the encyclopedic knowledge graph (KG) DBpedia, for different text-related
tasks from the area of information retrieval (IR) and natural language processing (NLP). The
key idea is to apply entity linking (EL) methods that identify mentions of KB entities in text,
and then exploit the structured information within KGs. Developing entity-centric methods for
text understanding using KG exploration is the focus of this work.
We aim to show that structured background knowledge is a means for improving performance in
different IR and NLP tasks that traditionally only make use of the unstructured text input itself.
Thereby, the KB entities mentioned in text act as connection between the unstructured text and
the structured KG. We focus in particular on how to best leverage the knowledge as contained in
such fully-structured (RDF) KGs like DBpedia with their labeled edges/predicates – which is in
contrast to previous work on Wikipedia-based approaches we build upon, which typically relies
on unlabeled graphs only. The contribution of this thesis can be structured along its three parts:
In Part I, we apply EL and semantify short text snippets with KB entities. While only retrieving
types and categories from DBpedia for each entity, we are able to leverage this information
to create semantically coherent clusters of text snippets. This pipeline of connecting text to
background knowledge via the mentioned entities will be reused in all following chapters.
In Part II, we focus on semantic similarity and extend the idea of semantifying text with entities
by proposing in Chapter 5 a model that represents whole documents by their entities. In this
model, comparing documents semantically with each other is viewed as the task of comparing
the semantic relatedness of the respective entities, which we address in Chapter 4. We propose
an unsupervised graph weighting schema and show that weighting the DBpedia KG leads to
better results on an existing entity ranking dataset. The exploration of weighted KG paths turns
out to be also useful when trying to disambiguate the entities from an open information extraction
(OIE) system in Chapter 6. With this weighting schema, the integration of KG information
for computing semantic document similarity in Chapter 5 becomes the task of comparing the two
KG subgraphs with each other, which we address by an approximate subgraph matching. Based
on a well-established evaluation dataset for semantic document similarity, we show that our unsupervised
method achieves competitive performance similar to other state-of-the-art methods.
Our results from this part indicate that KGs can contain helpful background knowledge, in particular
when exploring KG paths, but that selecting the relevant parts of the graph is an important
yet difficult challenge.
In Part III, we shift to the task of relevance ranking and first study in Chapter 7 how to best
retrieve KB entities for a given keyword query. Combining again text with KB information, we
extract entities from the top-k retrieved, query-specific documents and then link the documents
to two different KBs, namely Wikipedia and DBpedia. In a learning-to-rank setting, we study
extensively which features from the text, theWikipedia KB, and the DBpedia KG can be helpful
for ranking entities with respect to the query. Experimental results on two datasets, which build
upon existing TREC document retrieval collections, indicate that the document-based mention
frequency of an entity and the Wikipedia-based query-to-entity similarity are both important
features for ranking. The KG paths in contrast play only a minor role in this setting, even when
integrated with a semantic kernel extension. In Chapter 8, we further extend the integration of
query-specific text documents and KG information, by extracting not only entities, but also relations
from text. In this exploratory study based on a self-created relevance dataset, we find that
not all extracted relations are relevant with respect to the query, but that they often contain information
not contained within the DBpedia KG. The main insight from the research presented in
this part is that in a query-specific setting, established IR methods for document retrieval provide
an important source of information even for entity-centric tasks, and that a close integration of
relevant text document and background knowledge is promising.
Finally, in the concluding chapter we argue that future research should further address the integration
of KG information with entities and relations extracted from (specific) text documents,
as their potential seems to be not fully explored yet. The same holds also true for a better KG
exploration, which has gained some scientific interest in recent years. It seems to us that both aspects
will remain interesting problems in the next years, also because of the growing importance
of KGs for web search and knowledge modeling in industry and academia.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
| Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt. |
Suche Autoren in
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|