Bias mitigation for large language models using adversarial learning

Ernst, Jasmina S. ; Marton, Sascha ; Brinkmann, Jannik ; Vellasques, Eduardo ; Foucard, Damien ; Kraemer, Martin ; Lambert, Marian

PDF
paper11.pdf - Veröffentlichte Version
Download (500kB)

URN:	urn:nbn:de:bsz:180-madoc-670249
Dokumenttyp:	Konferenzveröffentlichung
Erscheinungsjahr:	2023
Buchtitel:	Proceedings of the 1st Workshop on Fairness and Bias in AI co-located with 26th European Conference on Artificial Intelligence (ECAI 2023),Kraków, Poland, October 1st, 2023
Titel einer Zeitschrift oder einer Reihe:	CEUR Workshop Proceedings
Band/Volume:	3523
Seitenbereich:	1-14
Veranstaltungstitel:	1st Workshop on Fairness and Bias in AI
Veranstaltungsort:	Kraków, Poland
Veranstaltungsdatum:	01.10.2023
Herausgeber:	Calegari, Roberta ; Tubella, Andrea Aler ; González Castañe, Gabriel ; Dignum, Virginia ; Milano, Michaela
Ort der Veröffentlichung:	Aachen, Germany
Verlag:	RWTH Aachen
ISSN:	1613-0073
Verwandte URLs:	https://ceur-ws.org/Vol-3523/
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Außerfakultäre Einrichtungen > Institut für Enterprise Systems (InES)
Bereits vorhandene Lizenz:	Creative Commons Namensnennung 4.0 International (CC BY 4.0)
Fachgebiet:	004 Informatik
Freie Schlagwörter (Englisch):	fairness , debiasing , adversarial learning , NLP , LLMs
Abstract:	Commercial applications increasingly build on large language models (LLMs). Given the inherent biases of LLMs, advancements in fairness research are urgent. Prior methods for mitigating biases in LLMs only address fairness in either language generation tasks or downstream tasks. Additionally, they often incur substantial computational costs by training from scratch. We propose a novel debiasing method that employs adversarial learning during model pre training. Without hyperparameter optimization our comparably computationally efficient method demonstrates increased fairness on a natural language generation task while maintaining performance. In addition, we show that our fairness gains transfer to a downstream task, at a performance cost. We explore a fairness approach which holds a significant potential for redefining the landscape of fairness of LLMs: By learning a single debiased model which can be applied to a variety of tasks, this approach eliminates the need for additional or task-specific debiasing steps. Hence, it facilitates the development of fair commercial applications and constitutes a step towards the broader goal of fairness in societies at large.