Integrating Product Data from Websites offering Microdata Markup


Petrovski, Petar ; Bryl, Volha ; Bizer, Christian



DOI: https://doi.org/10.1145/2567948.2579704
URL: http://wwwconference.org/proceedings/www2014/compa...
Weitere URL: http://dws.informatik.uni-mannheim.de/fileadmin/le...
Dokumenttyp: Konferenzveröffentlichung
Erscheinungsjahr: 2014
Buchtitel: 23rd International World Wide Web Conference, WWW '14, Seoul, Republic of Korea, April 7-11, 2014, Companion Volume
Seitenbereich: 1299-1304
Veranstaltungstitel: 4th Workshop on Data Extraction and Object Search (DEOS2014)
Veranstaltungsdatum: April 2014
Herausgeber: Chung, Chin-Wan
Ort der Veröffentlichung: Geneva
Verlag: ACM
ISBN: 978-1-4503-2745-9
Sprache der Veröffentlichung: Englisch
Einrichtung: Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
Fachgebiet: 004 Informatik
Freie Schlagwörter (Englisch): Microdata , Information Extraction , Data Integration
Abstract: Large numbers of websites have started to markup their content using standards such as Microdata, Microformats, and RDFa. The marked-up content elements comprise descriptions of people, organizations, places, events, products, ratings, and reviews. This development has accelerated in last years as major search engines such as Google, Bing and Yahoo! use the markup to improve their search results. Embedding semantic markup facilitates identifying content elements on webpages. However, the markup is mostly not as fine-grained as desirable for applications that aim to integrate data from large numbers of websites. This paper discusses the challenges that arise in the task of integrating descriptions of electronic products from several thousand e-shops that offer Microdata markup. We present a solution for each step of the data integration process including Microdata extraction, product classification, product feature extraction, identity resolution, and data fusion. We evaluate our processing pipeline using 1.9 million product offers from 9240 e-shops which we extracted from the Common Crawl 2012, a large public Web corpus.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadaten-Export


Zitation


+ Suche Autoren in

BASE: Petrovski, Petar ; Bryl, Volha ; Bizer, Christian

Google Scholar: Petrovski, Petar ; Bryl, Volha ; Bizer, Christian

ORCID: Petrovski, Petar ; Bryl, Volha ; Bizer, Christian ORCID: 0000-0003-2367-0237

+ Aufruf-Statistik

Aufrufe im letzten Jahr

Detaillierte Angaben



Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail


Actions (login required)

Eintrag anzeigen Eintrag anzeigen