Entities as topic labels : combining entity linking and labeled LDA to improve topic interpretability and evaluability


Lauscher, Anne ; Nanni, Federico ; Ruiz Fabo, Pablo ; Ponzetto, Simone Paolo


[img]
Preview
PDF
4-lauscher_et_al.pdf - Published

Download (951kB)

URL: https://ub-madoc.bib.uni-mannheim.de/41843
Additional URL: http://www.ai-lc.it/index.php?slab=ijcol_v2n2
URN: urn:nbn:de:bsz:180-madoc-418431
Document Type: Article
Year of publication: 2016
The title of a journal, publication series: IJCol - Italian journal of computational linguistics
Volume: 2
Issue number: 2
Page range: 67-88
Place of publication: Torino
Publishing house: Accademia University Press
ISBN: 978-88-99982-26-3 , 88-99982-26-0
Publication language: English
Institution: School of Business Informatics and Mathematics > Wirtschaftsinformatik III (Ponzetto 2016-)
Subject: 004 Computer science, internet
Keywords (English): Digital Humanities , Natural Language Processing , Topic Models
Abstract: Digital humanities scholars strongly need a corpus exploration method that provides topics easier to interpret than standard LDA topic models. To move towards this goal, here we propose a combination of two techniques, called Entity Linking and Labeled LDA. Our method identifies in an ontology a series of descriptive labels for each document in a corpus. Then it generates a specific topic for each label. Having a direct relation between topics and labels makes interpretation easier; using an ontology as background knowledge limits label ambiguity. As our topics are described with a limited number of clear-cut labels, they promote interpretability and support the quantitative evaluation of the obtained results. We illustrate the potential of the approach by applying it to three datasets, namely the transcription of speeches from the European Parliament fifth mandate, the Enron Corpus and the Hillary Clinton Email Dataset. While some of these resources have already been adopted by the natural language processing community, they still hold a large potential for humanities scholars, part of which could be exploited in studies that will adopt the fine-grained exploration method presented in this paper.
Additional information: Online-Ressource

Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




+ Citation Example and Export

Lauscher, Anne ; Nanni, Federico ORCID: 0000-0003-2484-4331 ; Ruiz Fabo, Pablo ; Ponzetto, Simone Paolo (2016) Entities as topic labels : combining entity linking and labeled LDA to improve topic interpretability and evaluability. Open Access IJCol - Italian journal of computational linguistics Torino 2 2 67-88 [Article]
[img]
Preview


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item