Exploring discrete representations in stochastic computation graphs: Challenges, benefits, and novel strategies
Friede, David
URN:
|
urn:nbn:de:bsz:180-madoc-669102
|
Dokumenttyp:
|
Dissertation
|
Erscheinungsjahr:
|
2023
|
Ort der Veröffentlichung:
|
Mannheim
|
Hochschule:
|
Universität Mannheim
|
Gutachter:
|
Stuckenschmidt, Heiner
|
Datum der mündl. Prüfung:
|
23 Januar 2024
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Practical Computer Science II: Artificial Intelligence (Stuckenschmidt 2009-)
|
Fachgebiet:
|
004 Informatik
|
Freie Schlagwörter (Englisch):
|
machine learning , deep learning , discrete latent representations , stochastic computation graphs , categorical variational autoencoder , gumbel-softmax distribution , disentangled representations , structure learning , neural architecture search
|
Abstract:
|
The evolution of deep learning has led to a need for models with enhanced interpretability and generalization behaviors. As part of this, discrete representations play a significant role since they tend to be more interpretable. This thesis explores discrete representations in Stochastic Computation Graphs (SCGs), focusing on challenges, benefits, and novel strategies for their structure and parameter learning. Recent successes in model-based reinforcement learning and text-to-image generation have demonstrated the empirical advantages of discrete latent representations. However, the reasons behind their benefits remain unclear. Furthermore, training deep learning models with discrete representations presents unique problems, primarily associated with differentiating through probability distributions. In response, we establish a background as a solid foundation for our research, focusing on SCGs. We then analyze the challenges associated with training models with discrete representations and their benefits. In addition, we propose novel strategies to address these challenges, which we evaluate experimentally across various domains. On the one hand, we propose learning the structure of computation graphs for efficient Neural Architecture Search. On the other hand, we propose altering the scale parameter of Gumbel noise perturbations and implementing dropout residual connections for efficient parameter learning of discrete SCGs. Furthermore, we present a new approach of employing a categorical Variational Autoencoder to enhance disentanglement. Our extensive experimental evaluations across diverse domains demonstrate the effectiveness of the proposed methods. We find that the challenges associated with training discrete representations can be significantly mitigated, and our strategies help to improve the models’ interpretability and generalization behavior. Our findings also reveal the inherent grid structure of categorical distributions as an efficient inductive prior for disentangled representations. This study provides critical insights into discrete representations in deep learning, extending our understanding and proposing novel methods that show promising results in experimental evaluations. Our work highlights promising future work for further refinement of discrete representations and their diverse applications.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
| Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt. |
Suche Autoren in
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|