Learning regular expressions for the extraction of product attributes from E-commerce microdata


Petrovski, Petar ; Bryl, Volha ; Bizer, Christian



URL: http://ceur-ws.org/Vol-1267/LD4IE2014_Petrovski.pd...
Additional URL: https://www.semanticscholar.org/paper/Learning-Reg...
Document Type: Conference or workshop publication
Year of publication: 2014
Book title: LD4IE 2014 : Proceedings of the Second International Workshop on Linked Data for Information Extraction (LD4IE 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 20, 2014
The title of a journal, publication series: CEUR Workshop Proceedings
Volume: 1267
Page range: 45-54
Conference title: LD4IE@ISWC
Location of the conference venue: Riva del Garda, Italy
Date of the conference: 20.10.2014
Publisher: Gentile, Anna Lisa ; Zhang, Ziqi ; D'Amato, Claudia ; Paulheim, Heiko
Place of publication: Aachen
Publishing house: RWTH
ISSN: 1613-0073
Related URLs:
Publication language: English
Institution: School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer)
Subject: 004 Computer science, internet
Abstract: A large number of e-commerce websites have started tomarkup their products using standards such as Microdata, Microfor-mats, and RDFa. However, the markup is mostly not as fine-grainedas desirable for applications and mostly consists of free text properties.This paper discusses the challenges that arise in the task of matchingdescriptions of electronic products from several thousand e-shops thato↵er Microdata markup. Specifically, our goal is to extract product at-tributes from product o↵ers, by means of regular expressions, in order tobuild well structured product specifications. For this purpose we presenta technique for learning regular expressions. We evaluate our attributeextraction approach using 1.9 million product o↵ers from 9,240 e-shopswhich we extracted from the Common Crawl 2012, a large public Webcorpus. Our results show that with our approach we are able to reach asimilar matching quality as with manually defined regular expressions.
Additional information: Online-Ressource

Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadata export


Citation


+ Search Authors in

BASE: Petrovski, Petar ; Bryl, Volha ; Bizer, Christian

Google Scholar: Petrovski, Petar ; Bryl, Volha ; Bizer, Christian

ORCID: Petrovski, Petar ; Bryl, Volha ; Bizer, Christian ORCID: 0000-0003-2367-0237 ["search_editors_ORCID" not defined] Gentile, Anna Lisa ; Zhang, Ziqi ; D'Amato, Claudia ; Paulheim, Heiko ORCID: 0000-0003-4386-8195

+ Page Views

Hits per month over past year

Detailed information



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item