Improving alignment and controllability in GANs and diffusion models

Li, Yumeng

PDF
DissertationLi.pdf - Veröffentlichte Version
Download (105MB)

URN:	urn:nbn:de:bsz:180-madoc-692757
Dokumenttyp:	Dissertation
Erscheinungsjahr:	2025
Ort der Veröffentlichung:	Mannheim
Hochschule:	Universität Mannheim
Gutachter:	Keuper, Margret
Datum der mündl. Prüfung:	2025
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Machine Learning (Keuper 2024-)
Lizenz:	Creative Commons Namensnennung 4.0 International (CC BY 4.0)
Fachgebiet:	004 Informatik
Freie Schlagwörter (Englisch):	generative models , GANs , diffusion models , GenAI , computer vision
Abstract:	Visual generative modeling is a transformative area that aims to synthesize diverse, realistic-looking visual content, e.g., images and videos. These models are widely applied in various domains, ranging from creative art design and the visual effects industry to data augmentation for downstream computer vision tasks. Over the past decade, this field has made tremendous progress, with significant advancements evolving from Generative Adversarial Networks (GANs) to diffusion models. Despite achieving higher fidelity and improved training stability, it remains challenging to control the synthesis process and generate content precisely as desired. To this end, this thesis presents several new techniques aimed at improving alignment and controllability in GANs and diffusion models across various tasks, such as GAN inversion, layout-to-image, text-to-image, and text-to-video generation. Further, these enhancements make the models more effective in a wide range of real-world applications.