Named Entity Recognition and Classification (NERC) is a well-studied NLP task which is typically approached using machine learning algorithms that rely on training data whose creation usually is expensive. The high costs result in the lack of NERC training data for many languages. An approach to create a multilingual NE corpus was presented in Wentland et al. (2008). The resulting resource called HeiNER describes a valuable number of NEs but does not include their types. We present a bootstrap approach based on Wikipedia’s category system to classify the NEs contained in HeiNER that is able to classify more than two million named entities to improve the resource’s quality.
Zusätzliche Informationen:
Online-Ressource
Dieser Eintrag ist Teil der Universitätsbibliographie.
Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.