Knowledge-rich image gist understanding beyond literal meaning

Weiland, Lydia ; Hulpus, Ioana ; Ponzetto, Simone Paolo ; Effelsberg, Wolfgang ; Dietz, Laura

DOI:	https://doi.org/10.1016/j.datak.2018.07.006
URL:	http://www.sciencedirect.com/science/article/pii/S...
Weitere URL:	https://www.researchgate.net/publication/326729691...
Dokumenttyp:	Zeitschriftenartikel
Erscheinungsjahr:	2018
Titel einer Zeitschrift oder einer Reihe:	Data & Knowledge Engineering
Band/Volume:	117
Seitenbereich:	114 - 132
Ort der Veröffentlichung:	Amsterdam [u.a.]
Verlag:	Elsevier
ISSN:	0169-023X
Verwandte URLs:	https://arxiv.org/abs/1904.08709
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems III: Enterprise Data Analysis (Ponzetto 2016-) Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Practical Computer Science II: Artificial Intelligence (Stuckenschmidt 2009-)
Fachgebiet:	004 Informatik
Freie Schlagwörter (Englisch):	Image understanding , Language and vision , Entity ranking
Abstract:	We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that have previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. Our proposed algorithm brings together aspects of entity linking and clustering, subgraph selection, semantic relatedness, and learning-to-rank in a novel way. In addition to this novel task and a complete evaluation of our approach, we introduce a novel dataset to foster further research on this problem. To enable a throughout investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of different kinds of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. Our supervised approach relies on the availability of human-annotated gold standard datasets. Annotating images with, possibly complex, topic labels is arguably a very time-consuming task that must rely on expert human annotators. We accordingly investigate whether parts of this process could be automatized using automatic image annotation and caption generation techniques. Our results indicate the general feasibility of an end-to-end approach to gist detection when replacing one of the two dimensions with automatically generated input, i.e., using automatically generated image tags or generated captions. However, we also show experimentally that state-of-the-art image and text understanding is better at understanding literal meanings of image-caption pairs, with non-literal pairs being instead generally more difficult to detect, thus paving the way for future work on understanding the message of images beyond their literal content.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Suche Autoren in

BASE: Weiland, Lydia ; Hulpus, Ioana ; Ponzetto, Simone Paolo ; Effelsberg, Wolfgang ; Dietz, Laura

Google Scholar: Weiland, Lydia ; Hulpus, Ioana ; Ponzetto, Simone Paolo ; Effelsberg, Wolfgang ; Dietz, Laura

ORCID: Weiland, Lydia, Hulpus, Ioana, Ponzetto, Simone Paolo

, Effelsberg, Wolfgang and Dietz, Laura

Aufruf-Statistik

Aufrufe im letzten Jahr

Detaillierte Angaben

Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail

Actions (login required)

Eintrag anzeigen

Knowledge-rich image gist understanding beyond literal meaning

Metadaten-Export

Zitation

Suche Autoren in

Aufruf-Statistik

Actions (login required)