Self-refinement strategies for LLM-based product attribute value extraction


Brinkmann, Alexander ; Bizer, Christian


[img]
Preview
PDF
BTW2025-132.pdf - Published

Download (456kB)

DOI: https://doi.org/10.18420/BTW2025-132
URL: https://dl.gi.de/items/1c6f9643-a1de-447a-a348-414...
URN: urn:nbn:de:bsz:180-madoc-708162
Document Type: Conference or workshop publication
Year of publication: 2025
Book title: Datenbanksysteme für Business, Technologie und Web (BTW 2025) : Workshopband, 03.-07. März 2025, Bamberg, Deutschland
The title of a journal, publication series: GI-Edition : Lecture Notes in Informatics. Proceedings
Volume: P-363
Page range: 291-304
Conference title: BTW 2025, 21st Conference on Database Systems for Business, Technology and Web
Location of the conference venue: Bamberg, Germany
Date of the conference: 03.-07.03.2025
Publisher: Binnig, Carsten ; Henrich, Andreas ; Nicklas, Daniela ; Schüle, Maximilian E. ; Meyer-Wegener, Klaus
Place of publication: Bonn
Publishing house: Gesellschaft für Informatik (GI)
ISSN: 1617-5468 , 2944-7682
Related URLs:
Publication language: English
Institution: School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
Pre-existing license: Creative Commons Attribution, Share Alike 4.0 International (CC BY-SA 4.0)
Subject: 004 Computer science, internet
Keywords (English): information extraction , product attribute value extraction , self-refinement , Large Language Models , e-commerce
Abstract: Structured product data, represented as attribute-value pairs, is crucial for e-commerce platforms to enable features such as faceted product search and attribute-based product comparison. However, vendors often supply unstructured product descriptions, necessitating attribute value extraction to ensure data consistency and usability. Large language models (LLMs), including OpenAI's GPT-4o, have demonstrated their potential for product attribute value extraction in few-shot scenarios. Recent research has shown that self-refinement techniques can improve the performance of LLMs on tasks such as code generation and text-to-SQL translation. For other tasks, applying these techniques has only led to increased costs due to the processing of additional tokens, without achieving an improved performance. This paper investigates applying two self-refinement techniques — error-based prompt rewriting and self-correction — to the product attribute value extraction task. The self-refinement techniques are evaluated across zero-shot, few-shot in-context learning, and fine-tuning scenarios. Experimental results reveal that both self-refinement techniques have a marginal impact on the performance of GPT-4o across the different scenarios while significantly increasing processing costs. For attribute value extraction scenarios involving training data, fine-tuning yields the highest performance while the ramp-up costs of fine-tuning are balanced out as the amount of product descriptions grows.




Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item