Bias mitigation for large language models using adversarial learning


Ernst, Jasmina S. ; Marton, Sascha ; Brinkmann, Jannik ; Vellasques, Eduardo ; Foucard, Damien ; Kraemer, Martin ; Lambert, Marian


[img] PDF
paper11.pdf - Published

Download (500kB)

URN: urn:nbn:de:bsz:180-madoc-670249
Document Type: Conference or workshop publication
Year of publication: 2023
Book title: Proceedings of the 1st Workshop on Fairness and Bias in AI co-located with 26th European Conference on Artificial Intelligence (ECAI 2023),Kraków, Poland, October 1st, 2023
The title of a journal, publication series: CEUR Workshop Proceedings
Volume: 3523
Page range: 1-14
Conference title: 1st Workshop on Fairness and Bias in AI
Location of the conference venue: Kraków, Poland
Date of the conference: 01.10.2023
Publisher: Calegari, Roberta ; Tubella, Andrea Aler ; González Castañe, Gabriel ; Dignum, Virginia ; Milano, Michaela
Place of publication: Aachen, Germany
Publishing house: RWTH Aachen
ISSN: 1613-0073
Related URLs:
Publication language: English
Institution: Außerfakultäre Einrichtungen > Institut für Enterprise Systems (InES)
Pre-existing license: Creative Commons Attribution 4.0 International (CC BY 4.0)
Subject: 004 Computer science, internet
Keywords (English): fairness , debiasing , adversarial learning , NLP , LLMs
Abstract: Commercial applications increasingly build on large language models (LLMs). Given the inherent biases of LLMs, advancements in fairness research are urgent. Prior methods for mitigating biases in LLMs only address fairness in either language generation tasks or downstream tasks. Additionally, they often incur substantial computational costs by training from scratch. We propose a novel debiasing method that employs adversarial learning during model pre training. Without hyperparameter optimization our comparably computationally efficient method demonstrates increased fairness on a natural language generation task while maintaining performance. In addition, we show that our fairness gains transfer to a downstream task, at a performance cost. We explore a fairness approach which holds a significant potential for redefining the landscape of fairness of LLMs: By learning a single debiased model which can be applied to a variety of tasks, this approach eliminates the need for additional or task-specific debiasing steps. Hence, it facilitates the development of fair commercial applications and constitutes a step towards the broader goal of fairness in societies at large.




Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item