|
New frontiers in neural probabilistic scoring : from attention to output generation in vision and language
Zhou, Yuxuan
![[img]](https://madoc.bib.uni-mannheim.de/71008/1.hassmallThumbnailVersion/Dissertation_Yuxuan_Zhou.pdf)  Vorschau |
|
PDF
Dissertation_Yuxuan_Zhou.pdf
- Veröffentlichte Version
Download (35MB)
|
|
URN:
|
urn:nbn:de:bsz:180-madoc-710085
|
|
Dokumenttyp:
|
Dissertation
|
|
Erscheinungsjahr:
|
2025
|
|
Ort der Veröffentlichung:
|
Mannheim, Germany
|
|
Hochschule:
|
University of Mannheim
|
|
Gutachter:
|
Keuper, Margret
|
|
Datum der mündl. Prüfung:
|
2025
|
|
Sprache der Veröffentlichung:
|
Englisch
|
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Machine Learning (Keuper 2024-)
|
|
Lizenz:
|
Creative Commons Namensnennung 4.0 International (CC BY 4.0)
|
|
Fachgebiet:
|
004 Informatik
|
|
Fachklassifikation:
|
CCS:
Artificial intelligence,
|
|
Freie Schlagwörter (Englisch):
|
neural probabilistic scoring , attention, SoftMax
|
|
Abstract:
|
Recent advancements in deep learning have highlighted the importance of probabilistic scoring within attention mechanisms and model predictions, significantly impacting tasks in computer vision and natural language processing. Neural probabilistic scoring refers to the process of computing normalized relevance scores based on hidden features of a neural network - often via softmax - that sum to one and reflect the relative importance of different tokens or features, without necessarily representing true probability distributions. Traditional reliance on softmax-based attention and output distributions can constrain model capacity and reliability. Its unimodal nature restricts capturing sparse, multi-modal patterns and reduces robustness to signal noise. Additionally, permutation invariance in scoring disrupts spatial and structural information, hindering performance on tasks with complex geometry or topology. This thesis addresses these limitations by introducing novel methodologies that refine probabilistic scoring in both the attention and output layers, aiming to enhance the performance and scalability of machine learning models across vision and language tasks.
In the first block, the work reimagines attention mechanisms. Central to this is MultiMax, a novel softmax alternative that achieves an improved balance between sparsity and multi-modality in the output distribution, enabling the attention mechanism to simultaneously focus on multiple relevant contexts while maintaining resilience to irrelevant entries. In the vision domain, Sp-ViT introduces learnable 2D spatial priors into Vision Transformers, enhancing the model’s ability to capture spatial relationships and improving performance in image classification tasks. For structured data, the work proposes Hypergraph Transformer to tackle skeleton-based action recognition, with hypergraph attention and a positional encoding based on graph distances as its core components. The work further extends the positional encoding with topological encoding, which successfully incorporates more comprehensive structural information through topological descriptors beyond graph representation.
The second block focuses on output probabilistic scoring to improve model reliability for both discriminative and generative models. During training, MaxSup regularizes classifiers’ output by mitigating the overconfidence in erroneous predictions and representation collapse in label smoothing, leading to more reliable predictions and more powerful feature representations. At inference, sampling-based decoding strategies modulate output distributions to improve LLMs’ output, balancing diversity and coherence in open-ended text generation. Together, MaxSup and LLM Sampling provide a unified framework for output probabilistic scoring, ensuring reliability and quality in both classification and generative tasks.
|
 | Dieser Eintrag ist Teil der Universitätsbibliographie. |
 | Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt. |
Suche Autoren in
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
 |
Eintrag anzeigen |
|
|