Entities as topic labels : combining entity linking and labeled LDA to improve topic interpretability and evaluability
Lauscher, Anne
;
Nanni, Federico
;
Ruiz Fabo, Pablo
;
Ponzetto, Simone Paolo
URL:
|
https://madoc.bib.uni-mannheim.de/41843
|
Weitere URL:
|
http://www.ai-lc.it/index.php?slab=ijcol_v2n2
|
URN:
|
urn:nbn:de:bsz:180-madoc-418431
|
Dokumenttyp:
|
Zeitschriftenartikel
|
Erscheinungsjahr:
|
2016
|
Titel einer Zeitschrift oder einer Reihe:
|
IJCol - Italian journal of computational linguistics
|
Band/Volume:
|
2
|
Heft/Issue:
|
2
|
Seitenbereich:
|
67-88
|
Ort der Veröffentlichung:
|
Torino
|
Verlag:
|
Accademia University Press
|
ISBN:
|
978-88-99982-26-3 , 88-99982-26-0
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems III: Enterprise Data Analysis (Ponzetto 2016-)
|
Fachgebiet:
|
004 Informatik
|
Freie Schlagwörter (Englisch):
|
Digital Humanities , Natural Language Processing , Topic Models
|
Abstract:
|
Digital humanities scholars strongly need a corpus exploration method that provides topics easier
to interpret than standard LDA topic models. To move towards this goal, here we propose a
combination of two techniques, called Entity Linking and Labeled LDA. Our method identifies
in an ontology a series of descriptive labels for each document in a corpus. Then it generates a specific topic for each label. Having a direct relation between topics and labels makes interpretation
easier; using an ontology as background knowledge limits label ambiguity. As our topics are described with a limited number of clear-cut labels, they promote interpretability and support
the quantitative evaluation of the obtained results. We illustrate the potential of the approach by
applying it to three datasets, namely the transcription of speeches from the European Parliament
fifth mandate, the Enron Corpus and the Hillary Clinton Email Dataset. While some of these
resources have already been adopted by the natural language processing community, they still
hold a large potential for humanities scholars, part of which could be exploited in studies that
will adopt the fine-grained exploration method presented in this paper.
|
Zusätzliche Informationen:
|
Online-Ressource
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
| Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt. |
Suche Autoren in
BASE:
Lauscher, Anne
;
Nanni, Federico
;
Ruiz Fabo, Pablo
;
Ponzetto, Simone Paolo
Google Scholar:
Lauscher, Anne
;
Nanni, Federico
;
Ruiz Fabo, Pablo
;
Ponzetto, Simone Paolo
ORCID:
Lauscher, Anne ; Nanni, Federico ORCID: 0000-0003-2484-4331 ; Ruiz Fabo, Pablo ; Ponzetto, Simone Paolo ORCID: 0000-0001-7484-2049
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|