Reproducible extraction of cross-lingual topics (rectr)

Chan, Chung-hong ; Zeng, Jing ; Wessler, Hartmut ; Jungblut, Marc ; Welbers, Kasper ; Bajjalieh, Joseph W. ; Atteveldt, Wouter van ; Althaus, Scott L.

Additional URL:
Document Type: Article
Year of publication: 2020
The title of a journal, publication series: Communication Methods and Measures
Volume: 14
Issue number: 4
Page range: 285-305
Place of publication: Philadelphia, PA
Publishing house: Routledge, Taylor & Francis Group
ISSN: 1931-2458 , 1931-2466
Publication language: English
Institution: Außerfakultäre Einrichtungen > Mannheim Centre for European Social Research - Research Department B
School of Humanities > Medien- und Kommunikationswissenschaft (Wessler 2007-)
Subject: 320 Political science
Abstract: With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and proposes a new method – Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source-aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news using our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Metadata export


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item