Using LLMs for the extraction and normalization of product attribute values
Brinkmann, Alexander
;
Baumann, Nick
;
Bizer, Christian
DOI:
|
https://doi.org/10.1007/978-3-031-70626-4_15
|
URL:
|
https://link.springer.com/chapter/10.1007/978-3-03...
|
Dokumenttyp:
|
Konferenzveröffentlichung
|
Erscheinungsjahr:
|
2024
|
Buchtitel:
|
Advances in databases and information systems : 28th European Conference, ADBIS 2024, Bayonne, France, August 28-31, 2024 ; Proceedings
|
Titel einer Zeitschrift oder einer Reihe:
|
Lecture Notes in Computer Science
|
Band/Volume:
|
14918
|
Seitenbereich:
|
217-230
|
Veranstaltungstitel:
|
28th European Conference on Advances in Databases and Information Systems (ADBIS 2024)
|
Veranstaltungsort:
|
Bayonne, France
|
Veranstaltungsdatum:
|
28.-31.08.2024
|
Herausgeber:
|
Tekli, Joe
;
Gamper, Johann
;
Chbeir, Richard
;
Manolopoulos, Yannis
|
Ort der Veröffentlichung:
|
Berlin [u.a.]
|
Verlag:
|
Springer
|
ISBN:
|
978-3-031-70628-8 , 978-3-031-70626-4
|
ISSN:
|
0302-9743 , 1611-3349
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
|
Fachgebiet:
|
004 Informatik
|
Freie Schlagwörter (Englisch):
|
information extraction , value normalization , Large Language Models
|
Abstract:
|
Product offers on e-commerce websites often consist of a product title and a textual product description. In order to enable features such as faceted product search or to generate product comparison tables, it is necessary to extract structured attribute-value pairs from the unstructured product titles and descriptions and to normalize the extracted values to a single, unified scale for each attribute. This paper explores the potential of using large language models (LLMs), such as GPT-3.5 and GPT-4, to extract and normalize attribute values from product titles and descriptions. We experiment with different zero-shot and few-shot prompt templates for instructing LLMs to extract and normalize attribute-value pairs. We introduce the Web Data Commons - Product Attribute Value Extraction (WDC-PAVE) benchmark dataset for our experiments. WDC-PAVE consists of product offers from 59 different websites which provide schema.org annotations. The offers belong to five different product categories, each with a specific set of attributes. The dataset provides manually verified attribute-value pairs in two forms: (i) directly extracted values and (ii) normalized attribute values. The normalization of the attribute values requires systems to perform the following types of operations: name expansion, generalization, unit of measurement conversion, and string wrangling. Our experiments demonstrate that GPT-4 outperforms the PLM-based extraction methods SU-OpenTag, AVEQA, and MAVEQA by 10%, achieving an F1-score of 91%. For the extraction and normalization of product attribute values, GPT-4 achieves a similar performance to the extraction scenario, while being particularly strong at string wrangling and name expansion.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Suche Autoren in
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|