Using market design to improve red teaming of generative AI models

Rehse, Dominik ; Valet, Sebastian ; Walter, Johannes

PDF
pb06-24.pdf - Veröffentlichte Version
Download (168kB)

URN:	urn:nbn:de:bsz:180-madoc-676119
Dokumenttyp:	Arbeitspapier
Erscheinungsjahr:	2024
Titel einer Zeitschrift oder einer Reihe:	ZEW policy brief
Band/Volume:	2024-06
Ort der Veröffentlichung:	Mannheim
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Sonstige Einrichtungen > ZEW - Leibniz-Zentrum für Europäische Wirtschaftsforschung
MADOC-Schriftenreihe:	Veröffentlichungen des ZEW (Leibniz-Zentrum für Europäische Wirtschaftsforschung) > ZEW policy brief
Fachgebiet:	330 Wirtschaft
Abstract:	With the final approval of the EU’s Artificial Intelligence Act (AI Act), it is now clear that general-purpose AI (GPAI) models with systemic risk will need to undergo adversarial testing. This provision is a response to the emergence of “generative AI” models, which are currently the most notable form of GPAI models generating rich-form content such as text, images, and video. Adversarial testing involves repeatedly interacting with a model to try to lead it to exhibit unwanted behaviour. However, the specific implementation of such testing for GPAI models with systemic risk has not been clearly spelled out in the AI Act. Instead, the legislation only refers to codes of practice and harmonised standards which are soon to be developed. In this policy brief, which is based on research funded by the Baden-Württemberg Foundation, we propose that these codes and standards should reflect that an effective adversarial testing regime requires testing by independent third parties, a well-defined goal, clear roles with proper incentive and coordination schemes for all parties involved, and standardised reporting of the results. The market design approach is helpful for developing, testing and improving the underlying rules and the institutional setup of such adversarial testing regimes. We outline the design space for an extensive form of adversarial testing, called red teaming, of generative AI models. This is intended to stimulate the discussion in preparation for the codes of practice, harmonised standards and potential additional provisions by governing bodies.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.

Suche Autoren in

BASE: Rehse, Dominik ; Valet, Sebastian ; Walter, Johannes

Google Scholar: Rehse, Dominik ; Valet, Sebastian ; Walter, Johannes

Download-Statistik

Downloads im letzten Jahr

Detaillierte Angaben

Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail

Actions (login required)

Eintrag anzeigen

Using market design to improve red teaming of generative AI models

Metadaten-Export

Zitation

Suche Autoren in

Download-Statistik

Actions (login required)