New frontiers in neural probabilistic scoring : from attention to output generation in vision and language


Zhou, Yuxuan


[img]
Preview
PDF
Dissertation_Yuxuan_Zhou.pdf - Published

Download (35MB)

URN: urn:nbn:de:bsz:180-madoc-710085
Document Type: Doctoral dissertation
Year of publication: 2025
Place of publication: Mannheim, Germany
University: University of Mannheim
Evaluator: Keuper, Margret
Date of oral examination: 2025
Publication language: English
Institution: School of Business Informatics and Mathematics > Machine Learning (Keuper 2024-)
License: CC BY 4.0 Creative Commons Attribution 4.0 International (CC BY 4.0)
Subject: 004 Computer science, internet
Classification: CCS: Artificial intelligence,
Keywords (English): neural probabilistic scoring , attention, SoftMax
Abstract: Recent advancements in deep learning have highlighted the importance of probabilistic scoring within attention mechanisms and model predictions, significantly impacting tasks in computer vision and natural language processing. Neural probabilistic scoring refers to the process of computing normalized relevance scores based on hidden features of a neural network - often via softmax - that sum to one and reflect the relative importance of different tokens or features, without necessarily representing true probability distributions. Traditional reliance on softmax-based attention and output distributions can constrain model capacity and reliability. Its unimodal nature restricts capturing sparse, multi-modal patterns and reduces robustness to signal noise. Additionally, permutation invariance in scoring disrupts spatial and structural information, hindering performance on tasks with complex geometry or topology. This thesis addresses these limitations by introducing novel methodologies that refine probabilistic scoring in both the attention and output layers, aiming to enhance the performance and scalability of machine learning models across vision and language tasks. In the first block, the work reimagines attention mechanisms. Central to this is MultiMax, a novel softmax alternative that achieves an improved balance between sparsity and multi-modality in the output distribution, enabling the attention mechanism to simultaneously focus on multiple relevant contexts while maintaining resilience to irrelevant entries. In the vision domain, Sp-ViT introduces learnable 2D spatial priors into Vision Transformers, enhancing the model’s ability to capture spatial relationships and improving performance in image classification tasks. For structured data, the work proposes Hypergraph Transformer to tackle skeleton-based action recognition, with hypergraph attention and a positional encoding based on graph distances as its core components. The work further extends the positional encoding with topological encoding, which successfully incorporates more comprehensive structural information through topological descriptors beyond graph representation. The second block focuses on output probabilistic scoring to improve model reliability for both discriminative and generative models. During training, MaxSup regularizes classifiers’ output by mitigating the overconfidence in erroneous predictions and representation collapse in label smoothing, leading to more reliable predictions and more powerful feature representations. At inference, sampling-based decoding strategies modulate output distributions to improve LLMs’ output, balancing diversity and coherence in open-ended text generation. Together, MaxSup and LLM Sampling provide a unified framework for output probabilistic scoring, ensuring reliability and quality in both classification and generative tasks.




Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

BASE: Zhou, Yuxuan

Google Scholar: Zhou, Yuxuan

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item