Using object detection, NLP, and knowledge bases to understand the message of images


Weiland, Lydia ; Hulpus, Ioana ; Ponzetto, Simone Paolo ; Dietz, Laura



DOI: https://doi.org/10.1007/978-3-319-51814-5_34
URL: http://link.springer.com/chapter/10.1007/978-3-319...
Weitere URL: https://www.springerprofessional.de/using-object-d...
Dokumenttyp: Konferenzveröffentlichung
Erscheinungsjahr: 2017
Buchtitel: MultiMedia Modeling : 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part II
Titel einer Zeitschrift oder einer Reihe: Lecture Notes in Computer Science
Band/Volume: 10133
Seitenbereich: 405-418
Veranstaltungstitel: 23rd International Conference, MMM 2017
Veranstaltungsort: Reykjavik, Iceland
Veranstaltungsdatum: 04-07 January 2017
Herausgeber: Amsaleg, Laurent
Ort der Veröffentlichung: Berlin [u.a.]
Verlag: Springer
ISBN: 978-3-319-51813-8 , 978-3-319-51814-5
ISSN: 0302-9743 , 1611-3349
Sprache der Veröffentlichung: Englisch
Einrichtung: Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems III: Enterprise Data Analysis (Ponzetto 2016-)
Fachgebiet: 004 Informatik
Abstract: With the increasing amount of multimodal content from social media posts and news articles, there has been an intensified effort towards conceptual labeling and multimodal (topic) modeling of images and of their affiliated texts. Nonetheless, the problem of identifying and automatically naming the core abstract message (gist) behind images has received less attention. This problem is especially relevant for the semantic indexing and subsequent retrieval of images. In this paper, we propose a solution that makes use of external knowledge bases such as Wikipedia and DBpedia. Its aim is to leverage complex semantic associations between the image objects and the textual caption in order to uncover the intended gist. The results of our evaluation prove the ability of our proposed approach to detect gist with a best MAP score of 0.74 when assessed against human annotations. Furthermore, an automatic image tagging and caption generation API is compared to manually set image and caption signals. We show and discuss the difficulty to find the correct gist especially for abstract, non-depictable gists as well as the impact of different types of signals on gist detection quality.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadaten-Export


Zitation


+ Suche Autoren in

+ Aufruf-Statistik

Aufrufe im letzten Jahr

Detaillierte Angaben



Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail


Actions (login required)

Eintrag anzeigen Eintrag anzeigen