Using object detection, NLP, and knowledge bases to understand the message of images


Weiland, Lydia ; Hulpus, Ioana ; Ponzetto, Simone Paolo ; Dietz, Laura



DOI: https://doi.org/10.1007/978-3-319-51814-5_34
URL: http://link.springer.com/chapter/10.1007/978-3-319...
Additional URL: https://www.springerprofessional.de/using-object-d...
Document Type: Conference or workshop publication
Year of publication: 2017
Book title: MultiMedia Modeling : 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part II
The title of a journal, publication series: Lecture Notes in Computer Science
Volume: 10133
Page range: 405-418
Conference title: 23rd International Conference, MMM 2017
Location of the conference venue: Reykjavik, Iceland
Date of the conference: 04-07 January 2017
Publisher: Amsaleg, Laurent
Place of publication: Berlin [u.a.]
Publishing house: Springer
ISBN: 978-3-319-51813-8 , 978-3-319-51814-5
ISSN: 0302-9743 , 1611-3349
Publication language: English
Institution: School of Business Informatics and Mathematics > Information Systems III: Enterprise Data Analysis (Ponzetto 2016-)
Subject: 004 Computer science, internet
Abstract: With the increasing amount of multimodal content from social media posts and news articles, there has been an intensified effort towards conceptual labeling and multimodal (topic) modeling of images and of their affiliated texts. Nonetheless, the problem of identifying and automatically naming the core abstract message (gist) behind images has received less attention. This problem is especially relevant for the semantic indexing and subsequent retrieval of images. In this paper, we propose a solution that makes use of external knowledge bases such as Wikipedia and DBpedia. Its aim is to leverage complex semantic associations between the image objects and the textual caption in order to uncover the intended gist. The results of our evaluation prove the ability of our proposed approach to detect gist with a best MAP score of 0.74 when assessed against human annotations. Furthermore, an automatic image tagging and caption generation API is compared to manually set image and caption signals. We show and discuss the difficulty to find the correct gist especially for abstract, non-depictable gists as well as the impact of different types of signals on gist detection quality.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadata export


Citation


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item