Extracting attribute-value pairs from product specifications on the web
Petrovski, Petar
;
Bizer, Christian
DOI:
|
https://doi.org/10.1145/3106426.3106449
|
URL:
|
http://doi.acm.org/10.1145/3106426.3106449
|
Weitere URL:
|
http://webdatacommons.org/productcorpus/paper/attr...
|
Dokumenttyp:
|
Konferenzveröffentlichung
|
Erscheinungsjahr:
|
2017
|
Buchtitel:
|
WI '17 Proceedings of the International Conference on Web Intelligence : Leipzig, Germany, August 23-26, 2017
|
Seitenbereich:
|
558-565
|
Veranstaltungstitel:
|
International Conference on Web Intelligence 2017
|
Veranstaltungsort:
|
Leipzig, Germany
|
Veranstaltungsdatum:
|
August 23-26, 2017
|
Herausgeber:
|
Sheth, Amit
|
Ort der Veröffentlichung:
|
New York, NY [u.a.]
|
Verlag:
|
ACM
|
ISBN:
|
978-1-4503-4951-2
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
|
Fachgebiet:
|
004 Informatik
|
Freie Schlagwörter (Englisch):
|
Information Extraction , Product Data , Web Tables
|
Abstract:
|
Comparison shopping portals integrate product offers from large numbers of e-shops in order to support consumers in their buying decisions. Product offers often consist of a title and a free-text product description, both describing product attributes that are considered relevant by the specific vendor. In addition, product offers might contain structured or semi-structured product specifications in the form of HTML tables and HTML lists. As product specifications often cover more product attributes than free-text descriptions, being able to extract attribute-value pairs from these specifications is a critical prerequisite for achieving good results in tasks such as product matching, product categorisation, faceted product search, and product recommendation. In this paper, we present an approach for extracting attribute value pairs from product specifications on the Web. We use supervised learning to classify the HTML tables and HTML lists within a web page as product specification or not. In order to extract attribute-value pairs from the HTML fragments identified by the specification detector, we again use supervised learning to classify columns as attribute column or value column. Compared to DEXTER, the current state-of-the-art approach for extracting attribute value pairs from product specifications, we introduce several new features for specification detection and support the extraction of attribute-value pairs from specifications having more than two columns. This allows us to improve the F-score up to 10% for extracting attribute-value pairs from tables and up to 3% for lists. In addition, we report the results of using duplicate-based schema matching to align the product attribute schemata of 32 different e-shops. This experiment confirms the suitability of duplicate-based schema matching for product data integration.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Suche Autoren in
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|