Learning regular expressions for the extraction of product attributes from E-commerce microdata
Petrovski, Petar
;
Bryl, Volha
;
Bizer, Christian
URL:
|
http://ceur-ws.org/Vol-1267/LD4IE2014_Petrovski.pd...
|
Weitere URL:
|
https://www.semanticscholar.org/paper/Learning-Reg...
|
Dokumenttyp:
|
Konferenzveröffentlichung
|
Erscheinungsjahr:
|
2014
|
Buchtitel:
|
LD4IE 2014 : Proceedings of the Second International Workshop on Linked Data for Information Extraction (LD4IE 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 20, 2014
|
Titel einer Zeitschrift oder einer Reihe:
|
CEUR Workshop Proceedings
|
Band/Volume:
|
1267
|
Seitenbereich:
|
45-54
|
Veranstaltungstitel:
|
LD4IE@ISWC
|
Veranstaltungsort:
|
Riva del Garda, Italy
|
Veranstaltungsdatum:
|
20.10.2014
|
Herausgeber:
|
Gentile, Anna Lisa
;
Zhang, Ziqi
;
d'Amato, Claudia
;
Paulheim, Heiko
|
Ort der Veröffentlichung:
|
Aachen, Germany
|
Verlag:
|
RWTH Aachen
|
ISSN:
|
1613-0073
|
Verwandte URLs:
|
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
|
Fachgebiet:
|
004 Informatik
|
Abstract:
|
A large number of e-commerce websites have started tomarkup their products using standards such as Microdata, Microfor-mats, and RDFa. However, the markup is mostly not as fine-grainedas desirable for applications and mostly consists of free text properties.This paper discusses the challenges that arise in the task of matchingdescriptions of electronic products from several thousand e-shops thato↵er Microdata markup. Specifically, our goal is to extract product at-tributes from product o↵ers, by means of regular expressions, in order tobuild well structured product specifications. For this purpose we presenta technique for learning regular expressions. We evaluate our attributeextraction approach using 1.9 million product o↵ers from 9,240 e-shopswhich we extracted from the Common Crawl 2012, a large public Webcorpus. Our results show that with our approach we are able to reach asimilar matching quality as with manually defined regular expressions.
|
Zusätzliche Informationen:
|
Online-Ressource
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Suche Autoren in
BASE:
Petrovski, Petar
;
Bryl, Volha
;
Bizer, Christian
Google Scholar:
Petrovski, Petar
;
Bryl, Volha
;
Bizer, Christian
ORCID:
Petrovski, Petar, Bryl, Volha and Bizer, Christian ORCID: https://orcid.org/0000-0003-2367-0237
["search_editors_ORCID" not defined]
Gentile, Anna Lisa, Zhang, Ziqi, d'Amato, Claudia and Paulheim, Heiko ORCID: https://orcid.org/0000-0003-4386-8195
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|