Visual generative modeling is a transformative area that aims to synthesize diverse, realistic-looking visual content, e.g., images and videos. These models are widely applied in various domains, ranging from creative art design and the visual effects industry to data augmentation for downstream computer vision tasks. Over the past decade, this field has made tremendous progress, with significant advancements evolving from Generative Adversarial Networks (GANs) to diffusion models. Despite achieving higher fidelity and improved training stability, it remains challenging to control the synthesis process and generate content precisely as desired. To this end, this thesis presents several new techniques aimed at improving alignment and controllability in GANs and diffusion models across various tasks, such as GAN inversion, layout-to-image, text-to-image, and text-to-video generation. Further, these enhancements make the models more effective in a wide range of real-world applications.
Dieser Eintrag ist Teil der Universitätsbibliographie.
Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.