Learning regular expressions for the extraction of product attributes from E-commerce microdata
Petrovski, Petar
;
Bryl, Volha
;
Bizer, Christian
URL:
|
http://ceur-ws.org/Vol-1267/LD4IE2014_Petrovski.pd...
|
Additional URL:
|
https://www.semanticscholar.org/paper/Learning-Reg...
|
Document Type:
|
Conference or workshop publication
|
Year of publication:
|
2014
|
Book title:
|
LD4IE 2014 : Proceedings of the Second International Workshop on Linked Data for Information Extraction (LD4IE 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 20, 2014
|
The title of a journal, publication series:
|
CEUR Workshop Proceedings
|
Volume:
|
1267
|
Page range:
|
45-54
|
Conference title:
|
LD4IE@ISWC
|
Location of the conference venue:
|
Riva del Garda, Italy
|
Date of the conference:
|
20.10.2014
|
Publisher:
|
Gentile, Anna Lisa
;
Zhang, Ziqi
;
d'Amato, Claudia
;
Paulheim, Heiko
|
Place of publication:
|
Aachen, Germany
|
Publishing house:
|
RWTH Aachen
|
ISSN:
|
1613-0073
|
Related URLs:
|
|
Publication language:
|
English
|
Institution:
|
School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
|
Subject:
|
004 Computer science, internet
|
Abstract:
|
A large number of e-commerce websites have started tomarkup their products using standards such as Microdata, Microfor-mats, and RDFa. However, the markup is mostly not as fine-grainedas desirable for applications and mostly consists of free text properties.This paper discusses the challenges that arise in the task of matchingdescriptions of electronic products from several thousand e-shops thato↵er Microdata markup. Specifically, our goal is to extract product at-tributes from product o↵ers, by means of regular expressions, in order tobuild well structured product specifications. For this purpose we presenta technique for learning regular expressions. We evaluate our attributeextraction approach using 1.9 million product o↵ers from 9,240 e-shopswhich we extracted from the Common Crawl 2012, a large public Webcorpus. Our results show that with our approach we are able to reach asimilar matching quality as with manually defined regular expressions.
|
Additional information:
|
Online-Ressource
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Search Authors in
BASE:
Petrovski, Petar
;
Bryl, Volha
;
Bizer, Christian
Google Scholar:
Petrovski, Petar
;
Bryl, Volha
;
Bizer, Christian
ORCID:
Petrovski, Petar, Bryl, Volha and Bizer, Christian ORCID: https://orcid.org/0000-0003-2367-0237
["search_editors_ORCID" not defined]
Gentile, Anna Lisa, Zhang, Ziqi, d'Amato, Claudia and Paulheim, Heiko ORCID: https://orcid.org/0000-0003-4386-8195
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
|
Show item |
|