Efficient learning of discrete-continuous computation graphs


Friede, David ; Niepert, Mathias



URL: https://proceedings.neurips.cc/paper_files/paper/2...
Dokumenttyp: Konferenzveröffentlichung
Erscheinungsjahr: 2022
Buchtitel: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) : online, 6-14 December 2021
Titel einer Zeitschrift oder einer Reihe: Advances in Neural Information Processing Systems
Band/Volume: 34
Seitenbereich: 6720-6732
Veranstaltungstitel: NeurIPS 2021
Veranstaltungsort: Online
Veranstaltungsdatum: 06.-14.12.2021
Herausgeber: Ranzato, Marc'Aurelio ; Beygelzimer, Alina ; Dauphin, Yann N. ; Liang, Percy ; Wortman Vaughan, Jennifer
Ort der Veröffentlichung: Red Hook, NY
Verlag: Curran Associates
Sprache der Veröffentlichung: Englisch
Einrichtung: Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Practical Computer Science II: Artificial Intelligence (Stuckenschmidt 2009-)
Fachgebiet: 004 Informatik
Abstract: Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadaten-Export


Zitation


+ Suche Autoren in

+ Aufruf-Statistik

Aufrufe im letzten Jahr

Detaillierte Angaben



Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail


Actions (login required)

Eintrag anzeigen Eintrag anzeigen