Home | Publications

Publications by our Members

2025


[1353]
Z. Jonassen, K. Lawrence, B. M. Wiesenfeld, S. Feuerriegel and D. Mann.
A qualitative analysis of remote patient monitoring: how a paradox mindset can support balancing emotional tensions in the design of healthcare technologies.
CSCW 2025 - 28th ACM SIGCHI Conference on Computer-Supported Cooperative Work and Social Computing. Bergen, Norway, Oct 18-22, 2025. To be published. Preprint available. arXiv
Abstract

Remote patient monitoring (RPM) is the use of digital technologies to improve patient care at a distance. However, current RPM solutions are often biased toward tech-savvy patients. To foster health equity, researchers have studied how to address the socio-economic and cognitive needs of diverse patient groups, but their emotional needs have remained largely neglected. We perform the first qualitative study to explore the emotional needs of diverse patients around RPM. Specifically, we conduct a thematic analysis of 18 interviews and 4 focus groups at a large US healthcare organization. We identify emotional needs that lead to four emotional tensions within and across stakeholder groups when applying an equity focus to the design and implementation of RPM technologies. The four emotional tensions are making diverse patients feel: (i) heard vs. exploited; (ii) seen vs. deprioritized for efficiency; (iii) empowered vs. anxious; and (iv) cared for vs. detached from care. To manage these emotional tensions across stakeholders, we develop design recommendations informed by a paradox mindset (i.e., ‘both-and’ rather than ‘and-or’ strategies).

MCML Authors
Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1352]
M. Windl, R. Amberg and T. Kosch.
The Illusion of Privacy: Investigating User Misperceptions in Browser Tracking Protection.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. To be published.
Abstract

Third parties track users’ web browsing activities, raising privacy concerns. Tracking protection extensions prevent this, but their influence on privacy protection beliefs shaped by narratives remains uncertain. This paper investigates users’ misperception of tracking protection offered by browser plugins. Our study explores how different narratives influence users’ perceived privacy protection by examining three tracking protection extension narratives: no protection, functional protection, and a placebo. In a study (N=36), participants evaluated their anticipated protection during a hotel
booking process, influenced by the narrative about the plugin’s functionality. However, participants viewed the same website without tracking protection adaptations. We show that users feel more protected when informed they use a functional or placebo extension, compared to no protection. Our findings highlight the deceptive nature of misleading privacy tools, emphasizing the need for greater transparency to prevent users from a false sense of protection, as such misleading tools negatively affect user study results.

MCML Authors
Link to website

Maximiliane Windl

Human-Centered Ubiquitous Media


[1351]
M. Windl, P. Z. Laboda and S. Mayer.
Designing Effective Consent Mechanisms for Spontaneous Interactions in Augmented Reality.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. To be published.
Abstract

Ubiquitous computing devices like Augmented Reality (AR) glasses allow countless spontaneous interactions – all serving different goals. AR devices rely on data transfer to personalize recommendations and adapt to the user. Today’s consent mechanisms, such as privacy policies, are suitable for long-lasting interactions; however, how users can consent to fast, spontaneous interactions is unclear. We first conducted two focus groups (N=17) to identify privacy-relevant scenarios in AR. We then conducted expert interviews (N=11) with co-design activities to establish effective consent mechanisms. Based on that, we contribute (1) a validated scenario taxonomy to define privacy-relevant AR interaction scenarios, (2) a flowchart to decide on the type of mechanisms considering contextual factors, (3) a design continuum and design aspects chart to create the mechanisms, and (4) a trade-off and prediction chart to evaluate the mechanism. Thus, we contribute a conceptual framework fostering a privacy-preserving future with AR.

MCML Authors
Link to website

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Profile Sven Mayer

Sven Mayer

Prof. Dr.

Human-Computer Interaction and Artificial Intelligence


[1350]
M. Windl, P. Thalhammer, D. Müller, A. Schmidt and S. S. Feger.
PrivacyHub: A Functional Tangible and Digital Ecosystem for Interoperable Smart Home Privacy Awareness and Control.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. To be published.
Abstract

Hubs are at the core of most smart homes. Modern cross-ecosystem protocols and standards enable smart home hubs to achieve interoperability across devices, offering the unique opportunity to integrate universally available smart home privacy awareness and control features. To date, such privacy features mainly focus on individual products or prototypical research artifacts. We developed a cross-ecosystem hub featuring a tangible dashboard and a digital web application to deepen our understanding of how smart home users interact with functional privacy features. The ecosystem allows users to control the connectivity states of their devices and raises awareness by visualizing device positions, states, and data flows. We deployed the ecosystem in six households for one week and found that it increased participants’ perceived control, awareness, and understanding of smart home privacy. We further found distinct differences between tangible and digital mechanisms. Our findings highlight the value of cross-ecosystem hubs for effective privacy management.

MCML Authors
Link to website

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Profile Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media


[1349]
M. Schröder, V. Melnychuk and S. Feuerriegel.
Differentially private learners for heterogeneous treatment effects.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025.
Abstract

tbd

MCML Authors
Link to website

Maresa Schröder

Artificial Intelligence in Management

Link to website

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1348]
M. Bini, L. Girrbach and Z. Akata.
Decoupling Angles and Strength in Low-rank Adaptation.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published.
Abstract

Parameter Efficient FineTuning (PEFT) methods have recently gained extreme popularity thanks to the vast availability of large-scale models, allowing to quickly adapt pretrained models to downstream tasks with minimal computational costs. However, current additive finetuning methods such as LoRA show low robustness to prolonged training and hyperparameter choices, not allowing for optimal out-of-the-box usage. On the other hand, multiplicative and bounded approaches such as ETHER, even if providing higher robustness, only allow for extremely low-rank adaptations and are limited to a fixed-strength transformation, hindering the expressive power of the adaptation. In this work, we propose the DeLoRA finetuning method that first normalizes and then scales the learnable low-rank matrices, thus effectively bounding the transformation strength, which leads to increased hyperparameter robustness at no cost in performance. We show that this proposed approach effectively and consistently improves over popular PEFT methods by evaluating our method on two finetuning tasks, subject-driven image generation and LLM instruction tuning.

MCML Authors
Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1347]
H. Baniecki, G. Casalicchio, B. Bischl and P. Biecek.
Efficient and Accurate Explanation Estimation with Distribution Compression.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

We discover a theoretical connection between explanation estimation and distribution compression that significantly improves the approximation of feature attributions, importance, and effects. While the exact computation of various machine learning explanations requires numerous model inferences and becomes impractical, the computational cost of approximation increases with an ever-increasing size of data and model parameters. We show that the standard i.i.d. sampling used in a broad spectrum of algorithms for post-hoc explanation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm of sample-efficient explainability. It relies on distribution compression through kernel thinning to obtain a data sample that best approximates its marginal distribution. CTE significantly improves the accuracy and stability of explanation estimation with negligible computational overhead. It often achieves an on-par explanation approximation error 2-3x faster by using fewer samples, i.e. requiring 2-3x fewer model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.

MCML Authors
Link to website

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[1346]
S. Dahan, G. Bénédict, L. Z. J. Williams, Y. Guo, D. Rückert, R. Leech and E. C. Robinson.
SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv GitHub
Abstract

Current AI frameworks for brain decoding and encoding, typically train and test models within the same datasets. This limits their utility for brain computer interfaces (BCI) or neurofeedback, for which it would be useful to pool experiences across individuals to better simulate stimuli not sampled during training. A key obstacle to model generalisation is the degree of variability of inter-subject cortical organisation, which makes it difficult to align or compare cortical signals across participants. In this paper we address this through the use of surface vision transformers, which build a generalisable model of cortical functional dynamics, through encoding the topography of cortical networks and their interactions as a moving image across a surface. This is then combined with tri-modal self-supervised contrastive (CLIP) alignment of audio, video, and fMRI modalities to enable the retrieval of visual and auditory stimuli from patterns of cortical activity (and vice-versa). We validate our approach on 7T task-fMRI data from 174 healthy participants engaged in the movie-watching experiment from the Human Connectome Project (HCP). Results show that it is possible to detect which movie clips an individual is watching purely from their brain activity, even for individuals and movies not seen during training. Further analysis of attention maps reveals that our model captures individual patterns of brain activity that reflect semantic and visual systems. This opens the door to future personalised simulations of brain function.

MCML Authors
Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[1345]
D. Frauen, K. Hess and S. Feuerriegel.
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

Estimating heterogeneous treatment effects (HTEs) over time is crucial in many disciplines such as personalized medicine. For example, electronic health records are commonly collected over several time periods and then used to personalize treatment decisions. Existing works for this task have mostly focused on model-based learners (i.e., learners that adapt specific machine-learning models). In contrast, model-agnostic learners – so-called meta-learners – are largely unexplored. In our paper, we propose several meta-learners that are model-agnostic and thus can be used in combination with arbitrary machine learning models (e.g., transformers) to estimate HTEs over time. Here, our focus is on learners that can be obtained via weighted pseudo-outcome regressions, which allows for efficient estimation by targeting the treatment effect directly. We then provide a comprehensive theoretical analysis that characterizes the different learners and that allows us to offer insights into when specific learners are preferable. Finally, we confirm our theoretical insights through numerical experiments. In sum, while meta-learners are already state-of-the-art for the static setting, we are the first to propose a comprehensive set of meta-learners for estimating HTEs in the time-varying setting.

MCML Authors
Link to website

Dennis Frauen

Artificial Intelligence in Management

Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1344]
F. Fumagalli, M. Muschalik, P. Frazzetto, J. Strotherm, L. Hermes, A. Sperduti, E. Hüllermeier and B. Hammer.
Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

Albeit the ubiquitous use of Graph Neural Networks (GNNs) in machine learning (ML) prediction tasks involving graph-structured data, their interpretability remains challenging. In explainable artificial intelligence (XAI), the Shapley Value (SV) is the predominant method to quantify contributions of individual features to a ML model’s output. Addressing the limitations of SVs in complex prediction models, Shapley Interactions (SIs) extend the SV to groups of features. In this work, we explain single graph predictions of GNNs with SIs that quantify node contributions and interactions among multiple nodes. By exploiting the GNN architecture, we show that the structure of interactions in node embeddings are preserved for graph prediction. As a result, the exponential complexity of SIs depends only on the receptive fields, i.e. the message-passing ranges determined by the connectivity of the graph and the number of convolutional layers. Based on our theoretical results, we introduce GraphSHAP-IQ, an efficient approach to compute any-order SIs exactly. GraphSHAP-IQ is applicable to popular message passing techniques in conjunction with a linear global pooling and output layer. We showcase that GraphSHAP-IQ substantially reduces the exponential complexity of computing exact SIs on multiple benchmark datasets. Beyond exact computation, we evaluate GraphSHAP-IQ’s approximation of SIs on popular GNN architectures and compare with existing baselines. Lastly, we visualize SIs of real-world water distribution networks and molecule structures using a SI-Graph.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[1343]
L. Girrbach, Y. Huang, S. Alaniz, T. Darrell and Z. Akata.
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs).
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

Pre-trained large language models (LLMs) have been reliably integrated with visual input for multimodal tasks. The widespread adoption of instruction-tuned image-to-text vision-language assistants (VLAs) like LLaVA and InternVL necessitates evaluating gender biases. We study gender bias in 22 popular open-source VLAs with respect to personality traits, skills, and occupations. Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. Similarly, they tend to attribute more skills and positive personality traits to women than to men, and we see a consistent tendency to associate negative personality traits with men. To eliminate the gender bias in these models, we find that finetuning-based debiasing methods achieve the best tradeoff between debiasing and retaining performance on downstream tasks. We argue for pre-deploying gender bias assessment in VLAs and motivate further development of debiasing strategies to ensure equitable societal outcomes.

MCML Authors
Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1342]
K. Hess and S. Feuerriegel.
Stabilized Neural Prediction of Potential Outcomes in Continuous Time.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

Patient trajectories from electronic health records are widely used to predict potential outcomes of treatments over time, which then allows to personalize care. Yet, existing neural methods for this purpose have a key limitation: while some adjust for time-varying confounding, these methods assume that the time series are recorded in discrete time. In other words, they are constrained to settings where measurements and treatments are conducted at fixed time steps, even though this is unrealistic in medical practice. In this work, we aim to predict potential outcomes in continuous time. The latter is of direct practical relevance because it allows for modeling patient trajectories where measurements and treatments take place at arbitrary, irregular timestamps. We thus propose a new method called stabilized continuous time inverse propensity network (SCIP-Net). For this, we further derive stabilized inverse propensity weights for robust prediction of the potential outcomes. To the best of our knowledge, our SCIP-Net is the first neural method that performs proper adjustments for time-varying confounding in continuous time.

MCML Authors
Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1341]
Y. Li, D. Rügamer, B. Bischl and M. Rezaei.
Calibrating LLMs with Information-Theoretic Evidential Deep Learning.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL
Abstract

Fine-tuned large language models (LLMs) often exhibit overconfidence, particularly when trained on small datasets, resulting in poor calibration and inaccurate uncertainty estimates. Evidential Deep Learning (EDL), an uncertainty-aware approach, enables uncertainty estimation in a single forward pass, making it a promising method for calibrating fine-tuned LLMs. However, despite its computational efficiency, EDL is prone to overfitting, as its training objective can result in overly concentrated probability distributions. To mitigate this, we propose regularizing EDL by incorporating an information bottleneck (IB). Our approach IB-EDL suppresses spurious information in the evidence generated by the model and encourages truly predictive information to influence both the predictions and uncertainty estimates. Extensive experiments across various fine-tuned LLMs and tasks demonstrate that IB-EDL outperforms both existing EDL and non-EDL approaches. By improving the trustworthiness of LLMs, IB-EDL facilitates their broader adoption in domains requiring high levels of confidence calibration.

MCML Authors
Link to website

Yawei Li

Statistical Learning & Data Science

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to website

Mina Rezaei

Dr.

Statistical Learning & Data Science


[1340]
L. Lux, A. H. Berger, A. Weers, N. Stucki, D. Rückert, U. Bauer and J. C. Paetzold.
Topograph: An efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

Topological correctness plays a critical role in many image segmentation tasks, yet most networks are trained using pixel-wise loss functions, such as Dice, neglecting topological accuracy. Existing topology-aware methods often lack robust topological guarantees, are limited to specific use cases, or impose high computational costs. In this work, we propose a novel, graph-based framework for topologically accurate image segmentation that is both computationally efficient and generally applicable. Our method constructs a component graph that fully encodes the topological information of both the prediction and ground truth, allowing us to efficiently identify topologically critical regions and aggregate a loss based on local neighborhood information. Furthermore, we introduce a strict topological metric capturing the homotopy equivalence between the union and intersection of prediction-label pairs. We formally prove the topological guarantees of our approach and empirically validate its effectiveness on binary and multi-class datasets. Our loss demonstrates state-of-the-art performance with up to fivefold faster loss computation compared to persistent homology methods.

MCML Authors
Link to website

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Link to website

Nico Stucki

Applied Topology and Geometry

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Ulrich Bauer

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry


[1339]
L. Sang, Z. Canfes, D. Cao, F. Bernard and D. Cremers.
Implicit Neural Surface Deformation with Explicit Velocity Fields.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

In this work, we introduce the first unsupervised method that simultaneously predicts time-varying neural implicit surfaces and deformations between pairs of point clouds. We propose to model the point movement using an explicit velocity field and directly deform a time-varying implicit field using the modified level-set equation. This equation utilizes an iso-surface evolution with Eikonal constraints in a compact formulation, ensuring the integrity of the signed distance field. By applying a smooth, volume-preserving constraint to the velocity field, our method successfully recovers physically plausible intermediate shapes. Our method is able to handle both rigid and non-rigid deformations without any intermediate shape supervision. Our experimental results demonstrate that our method significantly outperforms existing works, delivering superior results in both quality and efficiency.

MCML Authors
Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1338]
M. Schröder, V. Melnychuk and S. Feuerriegel.
Differentially private learners for heterogeneous treatment effects.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL
Abstract

Patient data is widely used to estimate heterogeneous treatment effects and understand the effectiveness and safety of drugs. Yet, patient data includes highly sensitive information that must be kept private. In this work, we aim to estimate the conditional average treatment effect (CATE) from observational data under differential privacy. Specifically, we present DP-CATE, a novel framework for CATE estimation that is doubly robust and ensures differential privacy of the estimates. For this, we build upon non-trivial tools from semi-parametric and robust statistics to exploit the connection between privacy and model robustness. Our framework is highly general and applies to any two-stage CATE meta-learner with a Neyman-orthogonal loss function. It can be used with all machine learning models employed for nuisance estimation. We further provide an extension of DP-CATE where we employ RKHS regression to release the complete doubly robust CATE function while ensuring differential privacy. We demonstrate the effectiveness of DP-CATE across various experiments using synthetic and real-world datasets. To the best of our knowledge, we are the first to provide a framework for CATE estimation that is doubly robust and differentially private.

MCML Authors
Link to website

Maresa Schröder

Artificial Intelligence in Management

Link to website

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1337]
E. Sommer, J. Robnik, G. Nozadze, U. Seljak and D. Rügamer.
Microcanonical Langevin Ensembles: Advancing the Sampling of Bayesian Neural Networks.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL
Abstract

Despite recent advances, sampling-based inference for Bayesian Neural Networks (BNNs) remains a significant challenge in probabilistic deep learning. While sampling-based approaches do not require a variational distribution assumption, current state-of-the-art samplers still struggle to navigate the complex and highly multimodal posteriors of BNNs. As a consequence, sampling still requires considerably longer inference times than non-Bayesian methods even for small neural networks, despite recent advances in making software implementations more efficient. Besides the difficulty of finding high-probability regions, the time until samplers provide sufficient exploration of these areas remains unpredictable. To tackle these challenges, we introduce an ensembling approach that leverages strategies from optimization and a recently proposed sampler called Microcanonical Langevin Monte Carlo (MCLMC) for efficient, robust and predictable sampling performance. Compared to approaches based on the state-of-the-art No-U-Turn Sampler, our approach delivers substantial speedups up to an order of magnitude, while maintaining or improving predictive performance and uncertainty quantification across diverse tasks and data modalities. The suggested Microcanonical Langevin Ensembles and modifications to MCLMC additionally enhance the method’s predictability in resource requirements, facilitating easier parallelization. All in all, the proposed method offers a promising direction for practical, scalable inference for BNNs.

MCML Authors
Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[1336]
T. Uscidda, L. Eyring, K. Roth, F. J. Theis, Z. Akata and M. Cuturi.
Disentangled Representation Learning with the Gromov-Monge Gap.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging geometrical considerations, e.g., by learning representations that preserve geometric features of the data, such as distances or angles between points. However, matching the prior while preserving geometric features is challenging, as a mapping that fully preserves these features while aligning the data distribution with the prior does not exist in general. To address these challenges, we introduce a novel approach to disentangled representation learning based on quadratic optimal transport. We formulate the problem using Gromov-Monge maps that transport one distribution onto another with minimal distortion of predefined geometric features, preserving them as much as can be achieved. To compute such maps, we propose the Gromov-Monge-Gap (GMG), a regularizer quantifying whether a map moves a reference distribution with minimal geometry distortion. We demonstrate the effectiveness of our approach for disentanglement across four standard benchmarks, outperforming other methods leveraging geometric considerations.

MCML Authors
Link to Profile Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1335]
Y. Wang, M. Schröder, D. Frauen, J. Schweisthal, K. Hess and S. Feuerriegel.
Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

Constructing confidence intervals (CIs) for the average treatment effect (ATE) from patient records is crucial to assess the effectiveness and safety of drugs. However, patient records typically come from different hospitals, thus raising the question of how multiple observational datasets can be effectively combined for this purpose. In our paper, we propose a new method that estimates the ATE from multiple observational datasets and provides valid CIs. Our method makes little assumptions about the observational datasets and is thus widely applicable in medical practice. The key idea of our method is that we leverage prediction-powered inferences and thereby essentially `shrink’ the CIs so that we offer more precise uncertainty quantification as compared to naïve approaches. We further prove the unbiasedness of our method and the validity of our CIs. We confirm our theoretical results through various numerical experiments. Finally, we provide an extension of our method for constructing CIs from combinations of experimental and observational datasets.

MCML Authors
Link to website

Yuxin Wang

Artificial Intelligence in Management

Link to website

Maresa Schröder

Artificial Intelligence in Management

Link to website

Dennis Frauen

Artificial Intelligence in Management

Link to website

Jonas Schweisthal

Artificial Intelligence in Management

Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1334]
Ö. Turgut, P. Müller, P. Hager, S. Shit, S. Starck, M. J. Menten, E. Martens and D. Rückert.
Unlocking the diagnostic potential of electrocardiograms through information transfer from cardiac magnetic resonance imaging.
Medical Image Analysis 101.103451 (Apr. 2025). DOI GitHub
Abstract

Cardiovascular diseases (CVD) can be diagnosed using various diagnostic modalities. The electrocardiogram (ECG) is a cost-effective and widely available diagnostic aid that provides functional information of the heart. However, its ability to classify and spatially localise CVD is limited. In contrast, cardiac magnetic resonance (CMR) imaging provides detailed structural information of the heart and thus enables evidence-based diagnosis of CVD, but long scan times and high costs limit its use in clinical routine. In this work, we present a deep learning strategy for cost-effective and comprehensive cardiac screening solely from ECG. Our approach combines multimodal contrastive learning with masked data modelling to transfer domain-specific information from CMR imaging to ECG representations. In extensive experiments using data from 40,044 UK Biobank subjects, we demonstrate the utility and generalisability of our method for subject-specific risk prediction of CVD and the prediction of cardiac phenotypes using only ECG data. Specifically, our novel multimodal pre-training paradigm improves performance by up to 12.19% for risk prediction and 27.59% for phenotype prediction. In a qualitative analysis, we demonstrate that our learned ECG representations incorporate information from CMR image regions of interest.

MCML Authors
Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[1333]
Y.-J. Li, M. Gladkova, Y. Xia, R. Wang and D. Cremers.
VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition.
3DV 2025 - 12th International Conference on 3D Vision. Singapore, Mar 25-28, 2025. To be published. Preprint available. arXiv
Abstract

Recent works on the global place recognition treat the task as a retrieval problem, where an off-the-shelf global descriptor is commonly designed in image-based and LiDAR-based modalities. However, it is non-trivial to perform accurate image-LiDAR global place recognition since extracting consistent and robust global descriptors from different domains (2D images and 3D point clouds) is challenging. To address this issue, we propose a novel Voxel-Cross-Pixel (VXP) approach, which establishes voxel and pixel correspondences in a self-supervised manner and brings them into a shared feature space. Specifically, VXP is trained in a two-stage manner that first explicitly exploits local feature correspondences and enforces similarity of global descriptors. Extensive experiments on the three benchmarks (Oxford RobotCar, ViViD++ and KITTI) demonstrate our method surpasses the state-of-the-art cross-modal retrieval by a large margin.

MCML Authors
Link to website

Mariia Gladkova

Computer Vision & Artificial Intelligence

Link to website

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1332]
L. Zumeta-Olaskoaga, A. Bender and D.-J. Lee.
Flexible modelling of time-varying exposures in event history analysis.
DAGStat 2025 - 7th Joint Statistical Meeting of the Deutsche Arbeitsgemeinschaft Statistik. Berlin, Germany, Mar 24-28, 2025. Poster presentation. Full paper available. DOI
Abstract

We present a flexible modelling approach to analyse time-varying exposures and recurrent events in team sports injuries. The approach is based on the piece-wise exponential additive mixed model where the effects of past exposures (i.e. high-intensity training loads) may accumulate over time and present complex forms of association. In order to identify a relevant time window at which past exposures have an impact on the current risk, we propose a penalty approach. We conduct a simulation study to evaluate the performance of the proposed model, under different true weight functions and different levels of heterogeneity between recurrent events. Finally, we illustrate the approach with a case study application involving an elite male football team participating in the Spanish LaLiga competition. The cohort includes time-loss injuries and external training load variables tracked by Global Positioning System devices, during the seasons 2017–2018 and 2018–2019.

MCML Authors
Link to website

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[1331]
L. Bothmann, S. Dandl, J. M. A. Jose M. Alvarez, P. A. Boustani and B. Bischl.
Privilege Scores for Fairness-Aware ML.
DAGStat 2025 - 7th Joint Statistical Meeting of the Deutsche Arbeitsgemeinschaft Statistik. Berlin, Germany, Mar 24-28, 2025. Poster presentation. Preprint available.
Abstract

Bias-preserving methods in fairness-aware machine learning (fairML) focus on metrics that prioritize formal equality by balancing error rates across subgroups. These methods can perpetuate historical discrimination embedded in real-world data. In contrast, bias-transforming methods aim for substantive equality by actively addressing historical inequalities. As a contribution to bias-transforming methods, we introduce the concept of privilege scores, a novel approach to identifying and quantifying individual privilege in machine learning tasks. Privilege scores use causal inference techniques to compare real-world outcomes to those in a ‘fair’ world in which the protected attributes do not influence the target variable. This individual-level perspective provides actionable insights for applications such as affirmative action and beyond. Key contributions include (1) the formalization of privilege scores, (2) a methodological framework for estimation with uncertainty quantification via confidence intervals, (3) an interpretable machine learning approach for understanding privilege score contributions, and (4) a novel in-processing method, Multi-PrivScore, to mitigate model-level discrimination during model training. Experiments on simulated and real-world data demonstrate the usefulness of privilege scores. Overall, our work highlights privilege scores as a versatile tool for assessing and mitigating historical discrimination in various machine learning applications.

MCML Authors
Link to website

Ludwig Bothmann

Dr.

Statistical Learning & Data Science

Link to website

Philip Amir Boustani

Statistical Learning & Data Science

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[1330]
R. Amoroso, G. Zhang, R. Koner, L. Baraldi, R. Cucchiara and V. Tresp.
Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. To be published. Preprint available. arXiv
Abstract

Video Question Answering (Video QA) is a challenging video understanding task that requires models to comprehend entire videos, identify the most relevant information based on contextual cues from a given question, and reason accurately to provide answers. Recent advancements in Multimodal Large Language Models (MLLMs) have transformed video QA by leveraging their exceptional commonsense reasoning capabilities. This progress is largely driven by the effective alignment between visual data and the language space of MLLMs. However, for video QA, an additional space-time alignment poses a considerable challenge for extracting question-relevant information across frames. In this work, we investigate diverse temporal modeling techniques to integrate with MLLMs, aiming to achieve question-guided temporal modeling that leverages pre-trained visual and textual alignment in MLLMs. We propose T-Former, a novel temporal modeling method that creates a question-guided temporal bridge between frame-wise visual perception and the reasoning capabilities of LLMs. Our evaluation across multiple video QA benchmarks demonstrates that T-Former competes favorably with existing temporal modeling approaches and aligns with recent advancements in video QA.

MCML Authors
Link to website

Gengyuan Zhang

Database Systems & Data Mining

Link to website

Rajat Koner

Database Systems & Data Mining

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1329]
A. H. Berger, L. Lux, S. Shit, I. Ezhov, G. Kaissis, M. Menten, D. Rückert and J. C. Paetzold.
Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. To be published. Preprint available. arXiv
Abstract

Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model. Due to this task’s complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging. This data sparsity necessitates transfer learning strategies akin to the state-of-the-art in general computer vision. In this work, we introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers. We propose (1) a regularized edge sampling loss to effectively learn object relations in multiple domains with different numbers of edges, (2) a domain adaptation framework for image-to-graph transformers aligning image- and graph-level features from different domains, and (3) a projection function that allows using 2D data for training 3D transformers. We demonstrate our method’s utility in cross-domain and cross-dimension experiments, where we utilize labeled data from 2D road networks for simultaneous learning in vastly different target domains. Our method consistently outperforms standard transfer learning and self-supervised pretraining on challenging benchmarks, such as retinal or whole-brain vessel graph extraction.

MCML Authors
Link to website

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Link to Profile Georgios Kaissis

Georgios Kaissis

Dr.

Privacy-Preserving and Trustworthy AI

Link to Profile Martin Menten

Martin Menten

Dr.

AI in Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[1328]
H. Chen, D. Krompass, J. Gu and V. Tresp.
FedPop: Federated Population-based Hyperparameter Tuning.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. To be published. Preprint available. arXiv
Abstract

Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their ’training-after-tuning’ framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based evolutionary algorithms to optimize the HPs, which accommodates various HP types at both the client and server sides. Compared with prior tuning methods, FedPop employs an online ’tuning-while-training’ framework, offering computational efficiency and enabling the exploration of a broader HP search space. Our empirical validation on the common FL benchmarks and complex real-world FL datasets, including full-sized Non-IID ImageNet-1K, demonstrates the effectiveness of the proposed method, which substantially outperforms the concurrent state-of-the-art HP-tuning methods in FL.

MCML Authors
Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1327]
A. Davtyan, S. Sameni, B. Ommer and P. Favaro.
CAGE: Unsupervised Visual Composition and Animation for Controllable Video Generation.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. To be published. Preprint available. arXiv GitHub
Abstract

In this work we propose a novel method for unsupervised controllable video generation. Once trained on a dataset of unannotated videos, at inference our model is capable of both composing scenes of predefined object parts and animating them in a plausible and controlled way. This is achieved by conditioning video generation on a randomly selected subset of local pre-trained self-supervised features during training. We call our model CAGE for visual Composition and Animation for video GEneration. We conduct a series of experiments to demonstrate capabilities of CAGE in various settings.

MCML Authors
Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1326]
M. Gui, J. Schusterbauer, U. Prestel, P. Ma, D. Kotovenko, O. Grebenkova, S. A. Baumann, V. Hu and B. Ommer.
DepthFM: Fast Monocular Depth Estimation with Flow Matching.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. To be published. Preprint available.
Abstract

Current discriminative depth estimation methods often produce blurry artifacts, while generative approaches suffer from slow sampling due to curvatures in the noise-to-depth transport. Our method addresses these challenges by framing depth estimation as a direct transport between image and depth distributions. We are the first to explore flow matching in this field, and we demonstrate that its interpolation trajectories enhance both training and sampling efficiency while preserving high performance. While generative models typically require extensive training data, we mitigate this dependency by integrating external knowledge from a pre-trained image diffusion model, enabling effective transfer even across differing objectives. To further boost our model performance, we employ synthetic data and utilize image-depth pairs generated by a discriminative model on an in-the-wild image dataset. As a generative model, our model can reliably estimate depth confidence, which provides an additional advantage. Our approach achieves competitive zero-shot performance on standard benchmarks of complex natural scenes while improving sampling efficiency and only requiring minimal synthetic data for training.

MCML Authors
Link to website

Pingchuan Ma

Machine Vision & Learning

Link to website

Olga Grebenkova

Machine Vision & Learning

Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1325]
Z. Li, S. S. Cranganore, N. Youngblut and N. Kilbertus.
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. To be published. Preprint available. arXiv
Abstract

Leveraging the vast genetic diversity within microbiomes offers unparalleled insights into complex phenotypes, yet the task of accurately predicting and understanding such traits from genomic data remains challenging. We propose a framework taking advantage of existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences. Based on our model, we develop attribution techniques to elucidate gene interaction effects that drive microbial adaptation to diverse environments. We train and validate our approach on a large dataset of high quality microbiome genomes from different habitats. We not only demonstrate solid predictive performance, but also how sequence-level information of entire genomes allows us to identify gene associations underlying complex phenotypes. Our attribution recovers known important interaction networks and proposes new candidates for experimental follow up.

MCML Authors
Link to website

Zhufeng Li

Ethics in Systems Design and Machine Learning

Link to Profile Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[1324]
P. Ma, L. Rietdorf, D. Kotovenko, V. T. Hu and B. Ommer.
Does VLM Classification Benefit from LLM Description Semantics?.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. To be published. Preprint available. arXiv
Abstract

Accurately describing images with text is a foundation of explainable AI. Vision-Language Models (VLMs) like CLIP have recently addressed this by aligning images and texts in a shared embedding space, expressing semantic similarities between vision and language embeddings. VLM classification can be improved with descriptions generated by Large Language Models (LLMs). However, it is difficult to determine the contribution of actual description semantics, as the performance gain may also stem from a semantic-agnostic ensembling effect, where multiple modified text prompts act as a noisy test-time augmentation for the original one. We propose an alternative evaluation scenario to decide if a performance boost of LLM-generated descriptions is caused by such a noise augmentation effect or rather by genuine description semantics. The proposed scenario avoids noisy test-time augmentation and ensures that genuine, distinctive descriptions cause the performance boost. Furthermore, we propose a training-free method for selecting discriminative descriptions that work independently of classname-ensembling effects. Our approach identifies descriptions that effectively differentiate classes within a local CLIP label neighborhood, improving classification accuracy across seven datasets. Additionally, we provide insights into the explainability of description-based image classification with VLMs.

MCML Authors
Link to website

Pingchuan Ma

Machine Vision & Learning

Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1323]
Y. Zhang, Z. Ma, Y. Ma, Z. Han, Y. Wu and V. Tresp.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. To be published. Preprint available. arXiv
Abstract

LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks and continuously refining this plan, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.

MCML Authors
Link to website

Yao Zhang

Database Systems & Data Mining

Link to website

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1322]
D. Tschernutter and S. Feuerriegel.
Data-driven dynamic police patrolling: An efficient Monte Carlo tree search.
European Journal of Operational Research 321.1 (Feb. 2025). DOI
Abstract

Crime is responsible for major financial losses and serious harm to the well-being of individuals, and, hence, a crucial task of police operations is effective patrolling. Yet, in existing decision models aimed at police operations, microscopic routing decisions from patrolling are not considered, and, furthermore, the objective is limited to surrogate metrics (e. g., response time) instead of crime prevention. In this paper, we thus formalize the decision problem of dynamic police patrolling as a Markov decision process that models microscopic routing decisions, so that the expected number of prevented crimes are maximized. We experimentally show that standard solution approaches for our decision problem are not scalable to real-world settings. As a remedy, we present a tailored and highly efficient Monte Carlo tree search algorithm. We then demonstrate our algorithm numerically using real-world crime data from Chicago and show that the decision-making by our algorithm offers significant improvements for crime prevention over patrolling tactics from current practice. Informed by our results, we finally discuss implications for improving the patrolling tactics in police operations.

MCML Authors
Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1321]
X.-Y. Tong, R. Dong and X. Zhu.
Global high categorical resolution land cover mapping via weak supervision.
ISPRS Journal of Photogrammetry and Remote Sensing 220 (Feb. 2025). DOI GitHub
Abstract

Land cover information is indispensable for advancing the United Nations’ sustainable development goals, and land cover mapping under a more detailed category system would significantly contribute to economic livelihood tracking and environmental degradation measurement. However, the substantial difficulty in acquiring fine-grained training data makes the implementation of this task particularly challenging. Here, we propose to combine fully labeled source domain and weakly labeled target domain for weakly supervised domain adaptation (WSDA). This is beneficial as the utilization of sparse and coarse weak labels can considerably alleviate the labor required for precise and detailed land cover annotation. Specifically, we introduce the Prototype-based pseudo-label Rectification and Expansion (PRE) approach, which leverages the prototypes (i.e., the class-wise feature centroids) as the bridge to connect sparse labels and global feature distributions. According to the feature distances to the prototypes, the confidence of pseudo-labels predicted in the unlabeled regions of the target domain is assessed. This confidence is then utilized to guide the dynamic expansion and rectification of pseudo-labels. Based on PRE, we carry out high categorical resolution land cover mapping for 10 cities in different regions around the world, severally using PlanetScope, Gaofen-1, and Sentinel-2 satellite images. In the study areas, we achieve cross-sensor, cross-category, and cross-continent WSDA, with the overall accuracy exceeding 80%. The promising results indicate that PRE is capable of reducing the dependency of land cover classification on high-quality annotations, thereby improving label efficiency. We expect our work to enable global fine-grained land cover mapping, which in turn promote Earth observation to provide more precise and thorough information for environmental monitoring.

MCML Authors
Link to Profile Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1320]
R. Litschko, O. Kraus, V. Blaschke and B. Plank.
Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL
Abstract

A large amount of local and culture-specific knowledge (e.g., people, traditions, food) can only be found in documents written in dialects. While there has been extensive research conducted on cross-lingual information retrieval (CLIR), the field of cross-dialect retrieval (CDIR) has received limited attention. Dialect retrieval poses unique challenges due to the limited availability of resources to train retrieval models and the high variability in non-standardized languages. We study these challenges on the example of German dialects and introduce the first German dialect retrieval dataset, dubbed WikiDIR, which consists of seven German dialects extracted from Wikipedia. Using WikiDIR, we demonstrate the weakness of lexical methods in dealing with high lexical variation in dialects. We further show that commonly used zero-shot cross-lingual transfer approach with multilingual encoders do not transfer well to extremely low-resource setups, motivating the need for resource-lean and dialect-specific retrieval models. We finally demonstrate that (document) translation is an effective way to reduce the dialect gap in CDIR.

MCML Authors
Link to website

Robert Litschko

Artificial Intelligence and Computational Linguistics

Link to website

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1319]
Y. Liu, C. Ma, H. Ye and H. Schütze.
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL GitHub
Abstract

Transliterating related languages that use different scripts into a common script shows effectiveness in improving crosslingual transfer in downstream tasks. However, this methodology often makes pretraining a model from scratch unavoidable, as transliteration brings about new subwords not covered in existing multilingual pretrained language models (mPLMs). This is not desired because it takes a lot of computation budget for pretraining. A more promising way is to make full use of available mPLMs. To this end, this paper proposes a simple but effective framework: Transliterate-Merge-Initialize (TransMI), which can create a strong baseline well-suited for data that is transliterated into a common script by exploiting an mPLM and its accompanied tokenizer. TransMI has three stages: (a) transliterate the vocabulary of an mPLM into a common script; (b) merge the new vocabulary with the original vocabulary; and (c) initialize the embeddings of the new subwords. We applied TransMI to three recent strong mPLMs, and our experiments demonstrate that TransMI not only preserves their ability to handle non-transliterated data, but also enables the models to effectively process transliterated data: the results show a consistent improvement of 3% to 34%, varying across different models and tasks.

MCML Authors
Link to website

Yihong Liu

Statistical NLP and Deep Learning

Link to website

Chunlan Ma

Statistical NLP and Deep Learning

Link to website

Haotian Ye

Statistical NLP and Deep Learning

Link to Profile Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1318]
Y. Liu, M. Wang, A. H. Kargaran, A. Imani, O. Xhelili, H. Ye, C. Ma, F. Yvon and H. Schütze.
How Transliterations Improve Crosslingual Alignment.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL
Abstract

Recent studies have shown that post-aligning multilingual pretrained language models (mPLMs) using alignment objectives on both original and transliterated data can improve crosslingual alignment. This improvement further leads to better crosslingual transfer performance. However, it remains unclear how and why a better crosslingual alignment is achieved, as this technique only involves transliterations, and does not use any parallel data. This paper attempts to explicitly evaluate the crosslingual alignment and identify the key elements in transliteration-based approaches that contribute to better performance. For this, we train multiple models under varying setups for two pairs of related languages: (1) Polish and Ukrainian and (2) Hindi and Urdu. To assess alignment, we define four types of similarities based on sentence representations. Our experiments show that adding transliterations alone improves the overall similarities, even for random sentence pairs. With the help of auxiliary alignment objectives, especially the contrastive objective, the model learns to distinguish matched from random pairs, leading to better alignments. However, we also show that better alignment does not always yield better downstream performance, suggesting that further research is needed to clarify the connection between alignment and performance.

MCML Authors
Link to website

Yihong Liu

Statistical NLP and Deep Learning

Link to website

Mingyang Wang

Statistical NLP and Deep Learning

Link to website

Amir Hossein Kargaran

Statistical NLP and Deep Learning

Link to website

Ayyoob Imani

Statistical NLP and Deep Learning

Link to website

Haotian Ye

Statistical NLP and Deep Learning

Link to website

Chunlan Ma

Statistical NLP and Deep Learning

Link to Profile Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1317]
A. Muñoz-Ortiz, V. Blaschke and B. Plank.
Evaluating Pixel Language Models on Non-Standardized Languages.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025.
Abstract

We explore the potential of pixel-based models for transfer learning from standard languages to dialects. These models convert text into images that are divided into patches, enabling a continuous vocabulary representation that proves especially useful for out-of-vocabulary words common in dialectal data. Using German as a case study, we compare the performance of pixel-based models to token-based models across various syntactic and semantic tasks. Our results show that pixel-based models outperform token-based models in part-of-speech tagging, dependency parsing and intent detection for zero-shot dialect evaluation by up to 26 percentage points in some scenarios, though not in Standard German. However, pixel-based models fall short in topic classification. These findings emphasize the potential of pixel-based models for handling dialectal data, though further research should be conducted to assess their effectiveness in various linguistic contexts.

MCML Authors
Link to website

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1316]
Y. Zhang, V. Hangya and A. Fraser.
LLM Sensitivity Challenges in Abusive Language Detection: Instruction-Tuned vs. Human Feedback.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL
Abstract

The capacity of large language models (LLMs) to understand and distinguish socially unacceptable texts enables them to play a promising role in abusive language detection. However, various factors can affect their sensitivity. In this work, we test whether LLMs have an unintended bias in abusive language detection, i.e., whether they predict more or less of a given abusive class than expected in zero-shot settings. Our results show that instruction-tuned LLMs tend to under-predict positive classes, since datasets used for tuning are dominated by the negative class. On the contrary, models fine-tuned with human feedback tend to be overly sensitive. In an exploratory approach to mitigate these issues, we show that label frequency in the prompt helps with the significant over-prediction.

MCML Authors
Link to Profile Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[1315]
E. Garces Arias, M. Li, C. Heumann and M. Aßenmacher.
Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. To be published. Preprint available. arXiv
Abstract

Decoding strategies for large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Since LLMs produce probability distributions over the entire vocabulary, various decoding methods have been developed to transform these probabilities into coherent and fluent text, each with its own set of hyperparameters. In this study, we present a large-scale, comprehensive analysis of how hyperparameter selection affects text quality in open-ended text generation across multiple LLMs, datasets, and evaluation metrics. Through an extensive sensitivity analysis, we provide practical guidelines for hyperparameter tuning and demonstrate the substantial influence of these choices on text quality. Using three established datasets, spanning factual domains (e.g., news) and creative domains (e.g., fiction), we show that hyperparameter tuning significantly impacts generation quality, though its effects vary across models and tasks. We offer in-depth insights into these effects, supported by both human evaluations and a synthesis of widely-used automatic evaluation metrics.

MCML Authors
Link to website

Esteban Garces Arias

Statistical Learning & Data Science

Link to website

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[1314]
V. Blaschke, F. Körner and B. Plank.
Add Noise, Tasks, or Layers? MaiNLP at the VarDial 2025 Shared Task on Norwegian Dialectal Slot and Intent Detection.
VarDial @COLING 2025 - 12th Workshop on NLP for Similar Languages, Varieties and Dialects at the The 31st International Conference on Computational Linguistics (COLING 2025). Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL
Abstract

Slot and intent detection (SID) is a classic natural language understanding task. Despite this, research has only more recently begun focusing on SID for dialectal and colloquial varieties. Many approaches for low-resource scenarios have not yet been applied to dialectal SID data, or compared to each other on the same datasets. We participate in the VarDial 2025 shared task on slot and intent detection in Norwegian varieties, and compare multiple set-ups: varying the training data (English, Norwegian, or dialectal Norwegian), injecting character-level noise, training on auxiliary tasks, and applying Layer Swapping, a technique in which layers of models fine-tuned on different datasets are assembled into a model. We find noise injection to be beneficial while the effects of auxiliary tasks are mixed. Though some experimentation was required to successfully assemble a model from layers, it worked surprisingly well; a combination of models trained on English and small amounts of dialectal data produced the most robust slot predictions. Our best models achieve 97.6% intent accuracy and 85.6% slot F1 in the shared task.

MCML Authors
Link to website

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to website

Felicia Körner

Artificial Intelligence and Computational Linguistics

Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1313]
X. Krückl, V. Blaschke and B. Plank.
Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study.
VarDial @COLING 2025 - 12th Workshop on NLP for Similar Languages, Varieties and Dialects at the The 31st International Conference on Computational Linguistics (COLING 2025). Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL
Abstract

Reliable slot and intent detection (SID) is crucial in natural language understanding for applications like digital assistants. Encoder-only transformer models fine-tuned on high-resource languages generally perform well on SID. However, they struggle with dialectal data, where no standardized form exists and training data is scarce and costly to produce. We explore zero-shot transfer learning for SID, focusing on multiple Bavarian dialects, for which we release a new dataset for the Munich dialect. We evaluate models trained on auxiliary tasks in Bavarian, and compare joint multi-task learning with intermediate-task training. We also compare three types of auxiliary tasks: token-level syntactic tasks, named entity recognition (NER), and language modelling. We find that the included auxiliary tasks have a more positive effect on slot filling than intent classification (with NER having the most positive effect), and that intermediate-task training yields more consistent performance gains. Our best-performing approach improves intent classification performance on Bavarian dialects by 5.1 and slot filling F1 by 8.4 percentage points.

MCML Authors
Link to website

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1312]
A.-M. Lutgen, A. Plum, C. Purschke and B. Plank.
Neural Text Normalization for Luxembourgish Using Real-Life Variation Data.
VarDial @COLING 2025 - 12th Workshop on NLP for Similar Languages, Varieties and Dialects at the The 31st International Conference on Computational Linguistics (COLING 2025). Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL
Abstract

Orthographic variation is very common in Luxembourgish texts due to the absence of a fully-fledged standard variety. Additionally, developing NLP tools for Luxembourgish is a difficult task given the lack of annotated and parallel data, which is exacerbated by ongoing standardization. In this paper, we propose the first sequence-to-sequence normalization models using the ByT5 and mT5 architectures with training data obtained from word-level real-life variation data. We perform a fine-grained, linguistically-motivated evaluation to test byte-based, word-based and pipeline-based models for their strengths and weaknesses in text normalization. We show that our sequence model using real-life variation data is an effective approach for tailor-made normalization in Luxembourgish.

MCML Authors
Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1311]
A. Sanin, J. K. Flowers, T. H. Piotrowiak, F. Felsen, L. Merker, A. Ludwig, D. Bresser and H. S. Stein.
Integrating Automated Electrochemistry and High-Throughput Characterization with Machine Learning to Explore Si─Ge─Sn Thin-Film Lithium Battery Anodes.
Advanced Energy Materials Early View.2404961 (Jan. 2025). DOI
Abstract

High-performance batteries need accelerated discovery and optimization of new anode materials. Herein, we explore the Si─Ge─Sn ternary alloy system as a candidate fast-charging anode materials system by utilizing a scanning droplet cell (SDC) as an autonomous electrochemical characterization tool with the goal of subsequent upscaling. As the SDC is performing experiments sequentially, an exploration of the entire ternary space is unfeasible due to time constraints. Thus, closed-loop optimization, guided by real-time data analysis and sequential learning algorithms, is utilized to direct experiments. The lead material identified is scaled up to a coin cell to validate the findings from the autonomous millimeter-scale thin-film electrochemical experimentation. Explainable machine learning (ML) models incorporating data from high-throughput Raman spectroscopy and X-ray diffraction (XRD) are used to elucidate the effect of short and long-range ordering on material performance.

MCML Authors
Link to Profile Helge S. Stein

Helge S. Stein

Prof. Dr.

Digital Catalysis


[1310]
F. Bortolussi, H. Sandström, F. Partovi, J. Mikkilä, P. Rinke and M. Rissanen.
Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning.
Atmospheric Chemistry and Physics 25.1 (Jan. 2025). DOI
Abstract

Chemical ionization mass spectrometry (CIMS) is widely used in atmospheric chemistry studies. However, due to the complex interactions between reagent ions and target compounds, chemical understanding remains limited and compound identification difficult. In this study, we apply machine learning to a reference dataset of pesticides in two standard solutions to build a model that can provide insights from CIMS analyses in atmospheric science. The CIMS measurements were performed with an Orbitrap mass spectrometer coupled to a thermal desorption multi-scheme chemical ionization inlet unit (TD-MION-MS) with both negative and positive ionization modes utilizing Br−, , H3O+ and (CH3)2COH+ (AceH+) as reagent ions. We then trained two machine learning methods on these data: (1) random forest (RF) for classifying if a pesticide can be detected with CIMS and (2) kernel ridge regression (KRR) for predicting the expected CIMS signals. We compared their performance on five different representations of the molecular structure: the topological fingerprint (TopFP), the molecular access system keys (MACCS), a custom descriptor based on standard molecular properties (RDKitPROP), the Coulomb matrix (CM) and the many-body tensor representation (MBTR). The results indicate that MACCS outperforms the other descriptors. Our best classification model reaches a prediction accuracy of 0.85 ± 0.02 and a receiver operating characteristic curve area of 0.91 ± 0.01. Our best regression model reaches an accuracy of 0.44 ± 0.03 logarithmic units of the signal intensity. Subsequent feature importance analysis of the classifiers reveals that the most important sub-structures are NH and OH for the negative ionization schemes and nitrogen-containing groups for the positive ionization schemes.

MCML Authors
Link to Profile Patrick Rinke

Patrick Rinke

Prof. Dr.

AI-based Material Science


[1309]
S. Grosu, M. P. Fabritius, M. Winkelmann, D. Puhr-Westerheide, M. Ingenerf, S. Maurus, A. Graser, C. Schulz, T. Knösel, C. C. Cyran, J. Ricke, P. M. Kazmierczak, M. Ingrisch and P. Wesp.
Effect of artificial intelligence-aided differentiation of adenomatous and non-adenomatous colorectal polyps at CT colonography on radiologists’ therapy management.
European Radiology Early Access (Jan. 2025). DOI
Abstract

Objectives: Adenomatous colorectal polyps require endoscopic resection, as opposed to non-adenomatous hyperplastic colorectal polyps. This study aims to evaluate the effect of artificial intelligence (AI)-assisted differentiation of adenomatous and non-adenomatous colorectal polyps at CT colonography on radiologists’ therapy management.
Materials and methods: Five board-certified radiologists evaluated CT colonography images with colorectal polyps of all sizes and morphologies retrospectively and decided whether the depicted polyps required endoscopic resection. After a primary unassisted reading based on current guidelines, a second reading with access to the classification of a radiomics-based random-forest AI-model labelling each polyp as ’non-adenomatous’ or ‘adenomatous’ was performed. Performance was evaluated using polyp histopathology as the reference standard.
Results: 77 polyps in 59 patients comprising 118 polyp image series (47% supine position, 53% prone position) were evaluated unassisted and AI-assisted by five independent board-certified radiologists, resulting in a total of 1180 readings (subsequent polypectomy: yes or no). AI-assisted readings had higher accuracy (76% +/− 1% vs. 84% +/− 1%), sensitivity (78% +/− 6% vs. 85% +/− 1%), and specificity (73% +/− 8% vs. 82% +/− 2%) in selecting polyps eligible for polypectomy (p < 0.001). Inter-reader agreement was improved in the AI-assisted readings (Fleiss’ kappa 0.69 vs. 0.92).
Conclusion: AI-based characterisation of colorectal polyps at CT colonography as a second reader might enable a more precise selection of polyps eligible for subsequent endoscopic resection. However, further studies are needed to confirm this finding and histopathologic polyp evaluation is still mandatory.

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to website

Philipp Wesp

Dr.

Clinical Data Science in Radiology


[1308]
F. Tian, H. Zhang, Y. Tan, L. Zhu, L. Shen, K. Qian, B. Hu, B. W. Schuller and Y. Yamamoto.
An On-Board Executable Multi-Feature Transfer-Enhanced Fusion Model for Three-Lead EEG Sensor-Assisted Depression Diagnosis.
IEEE Journal of Biomedical and Health Informatics 29.1 (Jan. 2025). DOI
Abstract

The development of affective computing and medical electronic technologies has led to the emergence of Artificial Intelligence (AI)-based methods for the early detection of depression. However, previous studies have often overlooked the necessity for the AI-assisted diagnosis system to be wearable and accessible in practical scenarios for depression recognition. In this work, we present an on-board executable multi-feature transfer-enhanced fusion model for our custom-designed wearable three-lead Electroencephalogram (EEG) sensor, based on EEG data collected from 73 depressed patients and 108 healthy controls. Experimental results show that the proposed model exhibits low-computational complexity (65.0 K parameters), promising Floating-Point Operations (FLOPs) performance (25.6 M), real-time processing (1.5 s/execution), and low power consumption (320.8 mW). Furthermore, it requires only 202.0 KB of Random Access Memory (RAM) and 279.6 KB of Read-Only Memory (ROM) when deployed on the EEG sensor. Despite its low computational and spatial complexity, the model achieves a notable classification accuracy of 95.2%, specificity of 94.0%, and sensitivity of 96.9% under independent test conditions. These results underscore the potential of deploying the model on the wearable three-lead EEG sensor for assisting in the diagnosis of depression.

MCML Authors
Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1307]
J. Beck, L. M. Kemeter, K. Dürrbeck, M. H. I. Abdalla and F. Kreuter.
Towards integrating ChatGPT into satellite image annotation workflows. A comparison of label quality and costs of human and automated annotators.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan. 2025). To be published.
Abstract

High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have
traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of Large Language Models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyses examine the annotation quality loss between the expert and other annotators. This comparison is conducted through (1) descriptive analyses, (2) fitting linear probability models, and (3) comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human
annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering and the task-specificity of expertise.

MCML Authors
Link to website

Jacob Beck

Social Data Science and AI Lab

Link to Profile Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[1306]
A. Akman, Q. Sun and B. W. Schuller.
Improving Audio Explanations using Audio Language Models.
IEEE Signal Processing Letters Early Access (Jan. 2025). DOI
Abstract

Foundation models are widely utilised for their strong representational capabilities, driven by training on extensive datasets with self-supervised learning. The increasing complexity of these models highlights the importance of interpretability to enhance transparency and improve human understanding of their decision-making processes. Most existing interpretability methods explain model behaviour by attributing importance to individual data elements across different layers, based on their influence on the final prediction. These approaches often emphasise only the most relevant features, overlooking the broader representational space, removing less important features. In this study, we propose a novel framework for explanation generation that serves as an alternative to feature removal, offering a more comprehensive understanding of model behaviour. Our framework leverages the generative abilities of audio language models to replace removed features with contextually appropriate alternatives, providing a more complete view of the model’s decision-making process. Through extensive evaluations on standard benchmarks, including keyword spotting and speech emotion recognition, our approach demonstrates its effectiveness in generating high-quality audio explanations.

MCML Authors
Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1305]
Y. Sun, Y. Zhou, X. Xu, J. Qi, F. Xu, Z. Ren and B. W. Schuller.
Weakly-Supervised Depression Detection in Speech Through Self-Learning Based Label Correction.
IEEE Transactions on Audio, Speech and Language Processing Early Access (Jan. 2025). DOI
Abstract

Automated Depression Detection (ADD) in speech aims to automatically estimate one’s depressive attributes through artificial intelligence tools towards spoken signals. Nevertheless, existing speech-based ADD works fail to sufficiently consider weakly-supervised cases with inaccurate labels, which may typically appear in intelligent mental health. In this regard, we propose the Self-Learning-based Label Correction (SLLC) approach for weakly-supervised depression detection in speech. The proposed approach employs a self-learning manner connecting a label correction module and a depression detection module. Within the approach, the label correction module fuses likelihood-ratio-based and prototype-based label correction strategies in order to effectively correct the inaccurate labels, while the depression detection module aims at detecting depressed samples through a 1D convolutional recurrent neural network with multiple types of losses. The experimental results on two depression detection corpora show that our proposed SLLC approach performs better compared with existing state-of-the-art speech-based depression detection approaches, in the case of weak supervision with inaccurate labels for depression detection in speech.

MCML Authors
Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1304]
W. Huang, Z. Gu, Y. Shi, Z. Xiong and X. Zhu.
Semi-Supervised Building Footprint Extraction Using Debiased Pseudo-Labels.
IEEE Transactions on Geoscience and Remote Sensing 63 (Jan. 2025). DOI GitHub
Abstract

Accurate extraction of building footprints from satellite imagery is of high value. Currently, deep learning methods are predominant in this field due to their powerful representation capabilities. However, they generally require extensive pixel-wise annotations, which constrains their practical application. Semi-supervised learning (SSL) significantly mitigates this requirement by leveraging large volumes of unlabeled data for model self-training (ST), thus enhancing the viability of building footprint extraction. Despite its advantages, SSL faces a critical challenge: the imbalanced distribution between the majority background class and the minority building class, which often results in model bias toward the background during training. To address this issue, this article introduces a novel method called DeBiased matching (DBMatch) for semi-supervised building footprint extraction. DBMatch comprises three main components: 1) a basic supervised learning module (SUP) that uses labeled data for initial model training; 2) a classical weak-to-strong ST module that generates pseudo-labels from unlabeled data for further model ST; and 3) a novel logit debiasing (LDB) module that calculates a global logit bias between building and background, allowing for dynamic pseudo-label calibration. To verify the effectiveness of the proposed DBMatch, extensive experiments are performed on three public building footprint extraction datasets covering six global cities in SSL setting. The experimental results demonstrate that our method significantly outperforms some advanced SSL methods in semi-supervised building footprint extraction.

MCML Authors
Link to website

Ziqi Gu

Data Science in Earth Observation

Link to Profile Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1303]
J. Li, T. Su, B. Zhao, F. Lv, Q. Wang, N. Navab, Y. Hu and Z. Jiang.
Ultrasound Report Generation With Cross-Modality Feature Alignment via Unsupervised Guidance.
IEEE Transactions on Medical Imaging 44.1 (Jan. 2025). DOI
Abstract

Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets.

MCML Authors
Link to website

Jun Li

Computational Imaging and AI in Medicine

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality


[1302]
W. Mayr, A. Triantafyllopoulos, A. Batliner, B. W. Schuller and T. M. Berghaus.
Assessing the Clinical and Functional Status of COPD Patients Using Speech Analysis During and After Exacerbation.
International Journal of Chronic Obstructive Pulmonary Disease 20 (Jan. 2025). DOI
Abstract

Background: Chronic obstructive pulmonary disease (COPD) affects breathing, speech production, and coughing. We evaluated a machine learning analysis of speech for classifying the disease severity of COPD.
Methods: In this single centre study, non-consecutive COPD patients were prospectively recruited for comparing their speech characteristics during and after an acute COPD exacerbation. We extracted a set of spectral, prosodic, and temporal variability features, which were used as input to a support vector machine (SVM). Our baseline for predicting patient state was an SVM model using self-reported BORG and COPD Assessment Test (CAT) scores.
Results: In 50 COPD patients (52% males, 22% GOLD II, 44% GOLD III, 32% GOLD IV, all patients group E), speech analysis was superior in distinguishing during and after exacerbation status compared to BORG and CAT scores alone by achieving 84% accuracy in prediction. CAT scores correlated with reading rhythm, and BORG scales with stability in articulation. Pulmonary function testing (PFT) correlated with speech pause rate and speech rhythm variability.
Conclusion: Speech analysis may be a viable technology for classifying COPD status, opening up new opportunities for remote disease monitoring.

MCML Authors
Link to website

Andreas Triantafyllopoulos

Health Informatics

Link to website

Anton Batliner

Dr.

Health Informatics

Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1301]
B. Lange.
Moral parenthood and gestation: replies to Cordeiro, Murphy, Robinson and Baron.
Journal of Medical Ethics 51.2 (Jan. 2025). DOI
Abstract

I am grateful to James Cordeiro, Timothy Murphy, Heloise Robinson and Teresa Baron for their perceptive and stimulating comments on my article in this journal. In what follows, I seek to respond to some of the main points raised in each commentary.

MCML Authors
Link to Profile Ben Lange

Ben Lange

Dr.

Ethics of Artificial Intelligence


[1300]
B. Lange.
Moral parenthood: not gestational.
Journal of Medical Ethics 51.2 (Jan. 2025). DOI
Abstract

Parenting our biological children is a centrally important matter, but how, if it all, can it be justified? According to a contemporary influential line of thinking, the acquisition by parents of a moral right to parent their biological children should be grounded by appeal to the value of the intimate emotional relationship that gestation facilitates between a newborn and a gestational procreator. I evaluate two arguments in defence of this proposal and argue that both are unconvincing.Data are available in a public, open access repository.

MCML Authors
Link to Profile Ben Lange

Ben Lange

Dr.

Ethics of Artificial Intelligence


[1299]
A. Bitarafan, M. Mozafari, M. F. Azampour, M. S. Baghshah, N. Navab and A. Farshad.
Self-supervised 3D medical image segmentation by flow-guided mask propagation learning.
Medical Image Analysis Journal Pre-proof.103478 (Jan. 2025). DOI GitHub
Abstract

Despite significant progress in 3D medical image segmentation using deep learning, manual annotation remains a labor-intensive bottleneck. Self-supervised mask propagation (SMP) methods have emerged to alleviate this challenge, allowing intra-volume segmentation with just a single slice annotation. However, the previous SMP methods often rely on 2D information and ignore volumetric contexts. While our previous work, called Vol2Flow, attempts to address this concern, it exhibits limitations, including not focusing enough on local (i.e., slice-pair) information, neglecting global information (i.e., volumetric contexts) in the objective function, and error accumulation during slice-to-slice reconstruction. This paper introduces Flow2Mask, a novel SMP method, developed to overcome the limitations of previous SMP approaches, particularly Vol2Flow. During training, Flow2Mask proposes the Local-to-Global (L2G) loss to learn inter-slice flow fields among all consecutive slices within a volume in an unsupervised manner. This dynamic loss is based on curriculum learning to gradually learn information within a volume from local to global contexts. Additionally, the Inter-Slice Smoothness (ISS) loss is introduced as a regularization term to encourage changes between the slices occur consistently and continuously. During inference, Flow2Mask leverages these 3D flow fields for inter-slice mask propagation in a 3D image, spreading annotation from a single annotated slice to the entire volume. Moreover, we propose an automatic strategy to select the most representative slice as initial annotation in the mask propagation process. Experimental evaluations on different abdominal datasets demonstrate that our proposed SMP method outperforms previous approaches and improves the overall mean DSC of Vol2Flow by +2.1%, +8.2%, and +4.0% for the Sliver, CHAOS, and 3D-IRCAD datasets, respectively. Furthermore, Flow2Mask even exhibits substantial improvements in weakly-supervised and self-supervised few-shot segmentation methods when applied as a mask completion tool.

MCML Authors
Link to website

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[1298]
T. Li, S. Hofer, G. Moholdt, A. Igneczi, K. Heidler, X. Zhu and J. Bamber.
Pervasive glacier retreats across Svalbard from 1985 to 2023.
Nature Communications 16.705 (Jan. 2025). DOI
Abstract

A major uncertainty in predicting the behaviour of marine-terminating glaciers is ice dynamics driven by non-linear calving front retreat, which is poorly understood and modelled. Using 124919 calving front positions for 149 marine-terminating glaciers in Svalbard from 1985 to 2023, generated with deep learning, we identify pervasive calving front retreats for non-surging glaciers over the past 38 years. We observe widespread seasonal cycles in calving front position for over half of the glaciers. At the seasonal timescale, peak retreat rates exhibit a several-month phase lag, with changes on the west coast occurring before those on the east coast, coincident with regional ocean warming. This spatial variability in seasonal patterns is linked to different timings of warm ocean water inflow from the West Spitsbergen Current, demonstrating the dominant role of ice-ocean interaction in seasonal front changes. The interannual variability of calving front retreat shows a strong sensitivity to both atmospheric and oceanic warming, with immediate responses to large air and ocean temperature anomalies in 2016 and 2019, likely driven by atmospheric blocking that can influence extreme temperature variability. With more frequent blocking occurring and continued regional warming, future calving front retreats will likely intensify, leading to more significant glacier mass loss.

MCML Authors
Link to Profile Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1297]
S. Feuerriegel, A. Maarouf, D. Bär, D. Geissler, J. Schweisthal, N. Pröllochs, C. E. Robertson, S. Rathje, J. Hartmann, S. M. Mohammad, O. Netzer, A. A. Siegel, B. Plank and J. J. Van Bavel.
Using natural language processing to analyse text data in behavioural science.
Nature Reviews Psychology (Jan. 2025). DOI
Abstract

Language is a uniquely human trait at the core of human interactions. The language people use often reflects their personality, intentions and state of mind. With the integration of the Internet and social media into everyday life, much of human communication is documented as written text. These online forms of communication (for example, blogs, reviews, social media posts and emails) provide a window into human behaviour and therefore present abundant research opportunities for behavioural science. In this Review, we describe how natural language processing (NLP) can be used to analyse text data in behavioural science. First, we review applications of text data in behavioural science. Second, we describe the NLP pipeline and explain the underlying modelling approaches (for example, dictionary-based approaches and large language models). We discuss the advantages and disadvantages of these methods for behavioural science, in particular with respect to the trade-off between interpretability and accuracy. Finally, we provide actionable recommendations for using NLP to ensure rigour and reproducibility.

MCML Authors
Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

Link to website

Abdurahman Maarouf

Artificial Intelligence in Management

Link to website

Dominik Bär

Artificial Intelligence in Management

Link to website

Jonas Schweisthal

Artificial Intelligence in Management

Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1296]
B. Lange.
Duplicates and Collective Scarcity.
Philosophy and Technology 38.7 (Jan. 2025). DOI
Abstract

Digital duplicates reduce the scarcity of individuals and thus may impact their instrumental and intrinsic value. I here expand upon this idea by introducing the notion of collective scarcity, which pertains to the limitations faced by social groups in maintaining their size, cohesion and function.

MCML Authors
Link to Profile Ben Lange

Ben Lange

Dr.

Ethics of Artificial Intelligence


[1295]
M. Binz, S. Alaniz, A. Roskies, B. , C. T. Bergstrom, C. Allen, D. Schad, D. Wulff, J. D. , Q. Zhang, R. M. Shiffrin, S. J. Gershman, V. Popov, E. M. Bender, M. Marelli, M. M. Botvinick, Z. Akata and E. Schulz.
How should the advancement of large language models affect the practice of science?.
Proceedings of the National Academy of Sciences 122.5 (Jan. 2025). DOI
Abstract

Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advancement of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schulz et al. make the argument that working with LLMs is not fundamentally different from working with human collaborators, while Bender et al. argue that LLMs are often misused and overhyped, and that their limitations warrant a focus on more specialized, easily interpretable tools. Marelli et al. emphasize the importance of transparent attribution and responsible use of LLMs. Finally, Botvinick and Gershman advocate that humans should retain responsibility for determining the scientific roadmap. To facilitate the discussion, the four perspectives are complemented with a response from each group. By putting these different perspectives in conversation, we aim to bring attention to important considerations within the academic community regarding the adoption of LLMs and their impact on both current and future scientific practices.

MCML Authors
Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1294]
K. Ghosh, M. Todorović, A. Vehtari and P. Rinke.
Active learning of molecular data for task-specific objectives.
The Journal of Chemical Physics 162.014103 (Jan. 2025). DOI
Abstract

Active learning (AL) has shown promise to be a particularly data-efficient machine learning approach. Yet, its performance depends on the application, and it is not clear when AL practitioners can expect computational savings. Here, we carry out a systematic AL performance assessment for three diverse molecular datasets and two common scientific tasks: compiling compact, informative datasets and targeted molecular searches. We implemented AL with Gaussian processes (GP) and used the many-body tensor as molecular representation. For the first task, we tested different data acquisition strategies, batch sizes, and GP noise settings. AL was insensitive to the acquisition batch size, and we observed the best AL performance for the acquisition strategy that combines uncertainty reduction with clustering to promote diversity. However, for optimal GP noise settings, AL did not outperform the randomized selection of data points. Conversely, for targeted searches, AL outperformed random sampling and achieved data savings of up to 64%. Our analysis provides insight into this task-specific performance difference in terms of target distributions and data collection strategies. We established that the performance of AL depends on the relative distribution of the target molecules in comparison to the total dataset distribution, with the largest computational savings achieved when their overlap is minimal.

MCML Authors
Link to Profile Patrick Rinke

Patrick Rinke

Prof. Dr.

AI-based Material Science


[1293]
C. Curreli, D. Muhle, A. Saroha, Z. Ye, R. Marin and D. Cremers.
Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction.
Preprint (Jan. 2025). arXiv GitHub
Abstract

Probabilistic human motion prediction aims to forecast multiple possible future movements from past observations. While current approaches report high diversity and realism, they often generate motions with undetected limb stretching and jitter. To address this, we introduce SkeletonDiffusion, a latent diffusion model that embeds an explicit inductive bias on the human body within its architecture and training. Our model is trained with a novel nonisotropic Gaussian diffusion formulation that aligns with the natural kinematic structure of the human skeleton. Results show that our approach outperforms conventional isotropic alternatives, consistently generating realistic predictions while avoiding artifacts such as limb distortion. Additionally, we identify a limitation in commonly used diversity metrics, which may inadvertently favor models that produce inconsistent limb lengths within the same sequence. SkeletonDiffusion sets a new benchmark on three real-world datasets, outperforming various baselines across multiple evaluation metrics.

MCML Authors
Link to website

Cecilia Curreli

Computer Vision & Artificial Intelligence

Link to website

Dominik Muhle

Computer Vision & Artificial Intelligence

Link to website

Abhishek Saroha

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1292]
F. Drexel, V. Sideri-Lampretsa, H. Bast, A. W. Marka, T. Koehler, F. T. Gassert, D. Pfeiffer, D. Rückert and F. Pfeiffer.
Deformable Image Registration of Dark-Field Chest Radiographs for Local Lung Signal Change Assessment.
Preprint (Jan. 2025). arXiv
Abstract

Dark-field radiography of the human chest has been demonstrated to have promising potential for the analysis of the lung microstructure and the diagnosis of respiratory diseases. However, previous studies of dark-field chest radiographs evaluated the lung signal only in the inspiratory breathing state. Our work aims to add a new perspective to these previous assessments by locally comparing dark-field lung information between different respiratory states. To this end, we discuss suitable image registration methods for dark-field chest radiographs to enable consistent spatial alignment of the lung in distinct breathing states. Utilizing full inspiration and expiration scans from a clinical chronic obstructive pulmonary disease study, we assess the performance of the proposed registration framework and outline applicable evaluation approaches. Our regional characterization of lung dark-field signal changes between the breathing states provides a proof-of-principle that dynamic radiography-based lung function assessment approaches may benefit from considering registered dark-field images in addition to standard plain chest radiographs.

MCML Authors
Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[1291]
F. Duelmer, M. Azampour and N. Navab.
UltraRay: Full-Path Ray Tracing for Enhancing Realism in Ultrasound Simulation.
Preprint (Jan. 2025). arXiv
Abstract

Traditional ultrasound simulators solve the wave equation to model pressure distribution fields, achieving high accuracy but requiring significant computational time and resources. To address this, ray tracing approaches have been introduced, modeling wave propagation as rays interacting with boundaries and scatterers. However, existing models simplify ray propagation, generating echoes at interaction points without considering return paths to the sensor. This can result in unrealistic artifacts and necessitates careful scene tuning for plausible results. We propose a novel ultrasound simulation pipeline that utilizes a ray tracing algorithm to generate echo data, tracing each ray from the transducer through the scene and back to the sensor. To replicate advanced ultrasound imaging, we introduce a ray emission scheme optimized for plane wave imaging, incorporating delay and steering capabilities. Furthermore, we integrate a standard signal processing pipeline to simulate end-to-end ultrasound image formation. We showcase the efficacy of the proposed pipeline by modeling synthetic scenes featuring highly reflective objects, such as bones. In doing so, our proposed approach, UltraRay, not only enhances the overall visual quality but also improves the realism of the simulated images by accurately capturing secondary reflections and reducing unnatural artifacts. By building on top of a differentiable framework, the proposed pipeline lays the groundwork for a fast and differentiable ultrasound simulation tool necessary for gradient-based optimization, enabling advanced ultrasound beamforming strategies, neural network integration, and accurate inverse scene reconstruction.

MCML Authors
Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[1290]
S. Eckman, B. Ma, C. Kern, R. Chew, B. Plank and F. Kreuter.
Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR).
Preprint (Jan. 2025). arXiv
Abstract

Models trained on crowdsourced labels may not reflect broader population views when annotator pools are not representative. Since collecting representative labels is challenging, we propose Population-Aligned Instance Replication (PAIR), a method to address this bias through statistical adjustment. Using a simulation study of hate speech and offensive language detection, we create two types of annotators with different labeling tendencies and generate datasets with varying proportions of the types. Models trained on unbalanced annotator pools show poor calibration compared to those trained on representative data. However, PAIR, which duplicates labels from underrepresented annotator groups to match population proportions, significantly reduces bias without requiring new data collection. These results suggest statistical techniques from survey research can help align model training with target populations even when representative annotator pools are unavailable. We conclude with three practical recommendations for improving training data quality.

MCML Authors
Link to website

Bolei Ma

Social Data Science and AI Lab

Link to Profile Christoph Kern

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

Link to Profile Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[1289]
Y. Feng, S. Feuerriegel and Y. R. Shrestha.
Contextualizing Recommendation Explanations with LLMs: A User Study.
Preprint (Jan. 2025). arXiv
Abstract

Large language models (LLMs) are increasingly prevalent in recommender systems, where LLMs can be used to generate personalized recommendations. Here, we examine how different LLM-generated explanations for movie recommendations affect users’ perceptions of cognitive, affective, and utilitarian needs and consumption intentions. In a pre-registered, between-subject online experiment (N=759) and follow-up interviews (N=30), we compare (a) LLM-generated generic explanations, and (b) LLM-generated contextualized explanations. Our findings show that contextualized explanations (i.e., explanations that incorporate users’ past behaviors) effectively meet users’ cognitive needs while increasing users’ intentions to watch recommended movies. However, adding explanations offers limited benefits in meeting users’ utilitarian and affective needs, raising concerns about the proper design and implications of LLM-generated explanations. Qualitative insights from interviews reveal that referencing users’ past preferences enhances trust and understanding but can feel excessive if overused. Furthermore, users with more active and positive engagement with the recommender system and movie-watching get substantial gains from contextualized explanations. Overall, our research clarifies how LLM-generated recommendations influence users’ motivations and behaviors, providing valuable insights for the future development of user-centric recommender systems, a key element in social media platforms and online ecosystems.

MCML Authors
Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1288]
U. Fischer-Abaigar, C. Kern and J. Perdomo.
The Value of Prediction in Identifying the Worst-Off.
Preprint (Jan. 2025). arXiv
Abstract

Machine learning is increasingly used in government programs to identify and support the most vulnerable individuals, prioritizing assistance for those at greatest risk over optimizing aggregate outcomes. This paper examines the welfare impacts of prediction in equity-driven contexts, and how they compare to other policy levers, such as expanding bureaucratic capacity. Through mathematical models and a real-world case study on long-term unemployment amongst German residents, we develop a comprehensive understanding of the relative effectiveness of prediction in surfacing the worst-off. Our findings provide clear analytical frameworks and practical, data-driven tools that empower policymakers to make principled decisions when designing these systems.

MCML Authors
Link to Profile Christoph Kern

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab


[1287]
Z. Haouari, J. Weidner, I. Ezhov, A. Varma, D. Rückert, B. Menze and B. Wiestler.
Efficient Deep Learning-based Forward Solvers for Brain Tumor Growth Models.
Preprint (Jan. 2025). arXiv
Abstract

Glioblastoma, a highly aggressive brain tumor, poses major challenges due to its poor prognosis and high morbidity rates. Partial differential equation-based models offer promising potential to enhance therapeutic outcomes by simulating patient-specific tumor behavior for improved radiotherapy planning. However, model calibration remains a bottleneck due to the high computational demands of optimization methods like Monte Carlo sampling and evolutionary algorithms. To address this, we recently introduced an approach leveraging a neural forward solver with gradient-based optimization to significantly reduce calibration time. This approach requires a highly accurate and fully differentiable forward model. We investigate multiple architectures, including (i) an enhanced TumorSurrogate, (ii) a modified nnU-Net, and (iii) a 3D Vision Transformer (ViT). The optimized TumorSurrogate achieved the best overall results, excelling in both tumor outline matching and voxel-level prediction of tumor cell concentration. It halved the MSE relative to the baseline model and achieved the highest Dice score across all tumor cell concentration thresholds. Our study demonstrates significant enhancement in forward solver performance and outlines important future research directions.

MCML Authors
Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy


[1286]
B. Jian, J. Pan, Y. Li, F. Bongratz, R. Li, D. Rückert, B. Wiestler and C. Wachinger.
TimeFlow: Longitudinal Brain Image Registration and Aging Progression Analysis.
Preprint (Jan. 2025). arXiv
Abstract

Predicting future brain states is crucial for understanding healthy aging and neurodegenerative diseases. Longitudinal brain MRI registration, a cornerstone for such analyses, has long been limited by its inability to forecast future developments, reliance on extensive, dense longitudinal data, and the need to balance registration accuracy with temporal smoothness. In this work, we present emph{TimeFlow}, a novel framework for longitudinal brain MRI registration that overcomes all these challenges. Leveraging a U-Net architecture with temporal conditioning inspired by diffusion models, TimeFlow enables accurate longitudinal registration and facilitates prospective analyses through future image prediction. Unlike traditional methods that depend on explicit smoothness regularizers and dense sequential data, TimeFlow achieves temporal consistency and continuity without these constraints. Experimental results highlight its superior performance in both future timepoint prediction and registration accuracy compared to state-of-the-art methods. Additionally, TimeFlow supports novel biological brain aging analyses, effectively differentiating neurodegenerative conditions from healthy aging. It eliminates the need for segmentation, thereby avoiding the challenges of non-trivial annotation and inconsistent segmentation errors. TimeFlow paves the way for accurate, data-efficient, and annotation-free prospective analyses of brain aging and chronic diseases.

MCML Authors
Link to website

Bailiang Jian

Artificial Intelligence in Radiology

Link to website

Yitong Li

Artificial Intelligence in Radiology

Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Radiology

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[1285]
O. Kononykhina, M. Schierholz and F. Kreuter.
The Impact of Question Framing on the Precision of Automatic Occupation Coding.
Preprint (Jan. 2025). arXiv
Abstract

Occupational data play a vital role in research, official statistics, and policymaking, yet their collection and accurate classification remain a persistent challenge. This study investigates the effects of occupational question wording on data variability and the performance of automatic coding tools. Through a series of survey experiments conducted and replicated in Germany, we tested two widely-used occupational question formats: one focusing on ‘job title’ (Berufsbezeichnung) and another on ‘occupational tasks’ (berufliche Tätigkeit). Our analysis reveals that automatic coding tools, such as CASCOT and OccuCoDe, exhibit significant sensitivity to the form and origin of the data. Specifically, these tools performed more efficiently when coding responses to the job title question format compared to the occupational task format. Additionally, we found that including examples of main tasks and duties in the questions led respondents to provide more detailed but less linguistically diverse responses. This reduced diversity may negatively affect the precision of automatic coding. These findings highlight the importance of tailoring automatic coding tools to the specific structure and origin of the data they are applied to. We emphasize the need for further research to optimize question design and coding tools for greater accuracy and applicability in occupational data collection.

MCML Authors
Link to website

Olga Kononykhina

Social Data Science and AI Lab

Link to website

Malte Schierholz

Dr.

Social Data Science and AI Lab

Link to Profile Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[1284]
C. Leininger, S. Rittel and L. Bothmann.
Overcoming Fairness Trade-offs via Pre-processing: A Causal Perspective.
Preprint (Jan. 2025). arXiv
Abstract

Training machine learning models for fair decisions faces two key challenges: The fairness-accuracy trade-off results from enforcing fairness which weakens its predictive performance in contrast to an unconstrained model. The incompatibility of different fairness metrics poses another trade-off – also known as the impossibility theorem. Recent work identifies the bias within the observed data as a possible root cause and shows that fairness and predictive performance are in fact in accord when predictive performance is measured on unbiased data. We offer a causal explanation for these findings using the framework of the FiND (fictitious and normatively desired) world, a ‘fair’ world, where protected attributes have no causal effects on the target variable. We show theoretically that (i) classical fairness metrics deemed to be incompatible are naturally satisfied in the FiND world, while (ii) fairness aligns with high predictive performance. We extend our analysis by suggesting how one can benefit from these theoretical insights in practice, using causal pre-processing methods that approximate the FiND world. Additionally, we propose a method for evaluating the approximation of the FiND world via pre-processing in practical use cases where we do not have access to the FiND world. In simulations and empirical studies, we demonstrate that these pre-processing methods are successful in approximating the FiND world and resolve both trade-offs. Our results provide actionable solutions for practitioners to achieve fairness and high predictive performance simultaneously.

MCML Authors
Link to website

Simon Rittel

Statistical Learning & Data Science

Link to website

Ludwig Bothmann

Dr.

Statistical Learning & Data Science


[1283]
T. Liu, X. Yu, W. Zhou, J. Gu and V. Tresp.
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings.
Preprint (Jan. 2025). arXiv
Abstract

Efficient preference optimization algorithms such as Direct Preference Optimization (DPO) have become a popular approach in aligning large language models (LLMs) with human preferences. These algorithms implicitly treat the LLM as a reward model, and focus on training it to correct misranked preference pairs. However, recent work~citep{chen2024preference} empirically finds that DPO training textit{rarely improves these misranked preference pairs}, despite its gradient emphasizing on these cases. We introduce FocalPO, a DPO variant that instead textit{down-weighs} misranked preference pairs and prioritizes enhancing the model’s understanding of pairs that it can already rank correctly. Inspired by Focal Loss used in vision tasks, FocalPO achieves this by adding a modulating factor to dynamically scale DPO loss. Our experiment demonstrates that FocalPO surpasses DPO and its variants on popular benchmarks like Alpaca Eval 2.0 using Mistral-Base-7B and Llama-3-Instruct-8B. Additionally, we empirically reveals how FocalPO affects training on correct and incorrect sample groups, further underscoring its effectiveness.

MCML Authors
Link to website

Tong Liu

Database Systems & Data Mining

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1282]
T. Mortier, A. Javanmardi, Y. Sale, E. Hüllermeier and W. Waegeman.
Conformal Prediction in Hierarchical Classification.
Preprint (Jan. 2025). arXiv
Abstract

Conformal prediction has emerged as a widely used framework for constructing valid prediction sets in classification and regression tasks. In this work, we extend the split conformal prediction framework to hierarchical classification, where prediction sets are commonly restricted to internal nodes of a predefined hierarchy, and propose two computationally efficient inference algorithms. The first algorithm returns internal nodes as prediction sets, while the second relaxes this restriction, using the notion of representation complexity, yielding a more general and combinatorial inference problem, but smaller set sizes. Empirical evaluations on several benchmark datasets demonstrate the effectiveness of the proposed algorithms in achieving nominal coverage.

MCML Authors
Link to website

Alireza Javanmardi

Artificial Intelligence & Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[1281]
A. Reuter, T. G. J. Rudner, V. Fortuin and D. Rügamer.
Can Transformers Learn Full Bayesian Inference in Context?.
Preprint (Jan. 2025). arXiv
Abstract

Transformers have emerged as the dominant architecture in the field of deep learning, with a broad range of applications and remarkable in-context learning (ICL) capabilities. While not yet fully understood, ICL has already proved to be an intriguing phenomenon, allowing transformers to learn in context – without requiring further training. In this paper, we further advance the understanding of ICL by demonstrating that transformers can perform full Bayesian inference for commonly used statistical models in context. More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows which enables us to infer complex posterior distributions for methods such as generalized linear models and latent factor models. Extensive experiments on real-world datasets demonstrate that our ICL approach yields posterior samples that are similar in quality to state-of-the-art MCMC or variational inference methods not operating in context.

MCML Authors
Link to Profile Vincent Fortuin

Vincent Fortuin

Dr.

Bayesian Deep Learning

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[1280]
A. Saroha, F. Hofherr, M. Gladkova, C. Curreli, O. Litany and D. Cremers.
ZDySS -- Zero-Shot Dynamic Scene Stylization using Gaussian Splatting.
Preprint (Jan. 2025). arXiv
Abstract

Stylizing a dynamic scene based on an exemplar image is critical for various real-world applications, including gaming, filmmaking, and augmented and virtual reality. However, achieving consistent stylization across both spatial and temporal dimensions remains a significant challenge. Most existing methods are designed for static scenes and often require an optimization process for each style image, limiting their adaptability. We introduce ZDySS, a zero-shot stylization framework for dynamic scenes, allowing our model to generalize to previously unseen style images at inference. Our approach employs Gaussian splatting for scene representation, linking each Gaussian to a learned feature vector that renders a feature map for any given view and timestamp. By applying style transfer on the learned feature vectors instead of the rendered feature map, we enhance spatio-temporal consistency across frames. Our method demonstrates superior performance and coherence over state-of-the-art baselines in tests on real-world dynamic scenes, making it a robust solution for practical applications.

MCML Authors
Link to website

Abhishek Saroha

Computer Vision & Artificial Intelligence

Link to website

Florian Hofherr

Computer Vision & Artificial Intelligence

Link to website

Mariia Gladkova

Computer Vision & Artificial Intelligence

Link to website

Cecilia Curreli

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1279]
R. Schwank, A. McCormack and M. Drton.
Robust Score Matching.
Preprint (Jan. 2025). arXiv
Abstract

Proposed in Hyvärinen (2005), score matching is a parameter estimation procedure that does not require computation of distributional normalizing constants. In this work we utilize the geometric median of means to develop a robust score matching procedure that yields consistent parameter estimates in settings where the observed data has been contaminated. A special appeal of the proposed method is that it retains convexity in exponential family models. The new method is therefore particularly attractive for non-Gaussian, exponential family graphical models where evaluation of normalizing constants is intractable. Support recovery guarantees for such models when contamination is present are provided. Additionally, support recovery is studied in numerical experiments and on a precipitation dataset. We demonstrate that the proposed robust score matching estimator performs comparably to the standard score matching estimator when no contamination is present but greatly outperforms this estimator in a setting with contamination.

MCML Authors
Link to Profile Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[1278]
M. H. Shaker and E. Hüllermeier.
Random Forest Calibration.
Preprint (Jan. 2025). arXiv
Abstract

The Random Forest (RF) classifier is often claimed to be relatively well calibrated when compared with other machine learning methods. Moreover, the existing literature suggests that traditional calibration methods, such as isotonic regression, do not substantially enhance the calibration of RF probability estimates unless supplied with extensive calibration data sets, which can represent a significant obstacle in cases of limited data availability. Nevertheless, there seems to be no comprehensive study validating such claims and systematically comparing state-of-the-art calibration methods specifically for RF. To close this gap, we investigate a broad spectrum of calibration methods tailored to or at least applicable to RF, ranging from scaling techniques to more advanced algorithms. Our results based on synthetic as well as real-world data unravel the intricacies of RF probability estimates, scrutinize the impacts of hyper-parameters, compare calibration methods in a systematic way. We show that a well-optimized RF performs as well as or better than leading calibration approaches.

MCML Authors
Link to website

Mohammad Hossein Shaker

Artificial Intelligence & Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[1277]
I. Tsangko, A. Triantafyllopoulos, M. Müller, H. Schröter and B. W. Schuller.
DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids.
Preprint (Jan. 2025). arXiv
Abstract

The DeepFilterNet (DFN) architecture was recently proposed as a deep learning model suited for hearing aid devices. Despite its competitive performance on numerous benchmarks, it still follows a `one-size-fits-all’ approach, which aims to train a single, monolithic architecture that generalises across different noises and environments. However, its limited size and computation budget can hamper its generalisability. Recent work has shown that in-context adaptation can improve performance by conditioning the denoising process on additional information extracted from background recordings to mitigate this. These recordings can be offloaded outside the hearing aid, thus improving performance while adding minimal computational overhead. We introduce these principles to the DFN model, thus proposing the DFingerNet (DFiN) model, which shows superior performance on various benchmarks inspired by the DNS Challenge.

MCML Authors
Link to website

Andreas Triantafyllopoulos

Health Informatics

Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1276]
T. N. Wolf and C. Wachinger.
WASUP: Interpretable Classification with Weight-Input Alignment and Class-Discriminative SUPports Vectors.
Preprint (Jan. 2025). arXiv
Abstract

The deployment of deep learning models in critical domains necessitates a balance between high accuracy and interpretability. We introduce WASUP, an inherently interpretable neural network that provides local and global explanations of its decision-making process. We prove that these explanations are faithful by fulfilling established axioms for explanations. Leveraging the concept of case-based reasoning, WASUP extracts class-representative support vectors from training images, ensuring they capture relevant features while suppressing irrelevant ones. Classification decisions are made by calculating and aggregating similarity scores between these support vectors and the input’s latent feature vector. We employ B-Cos transformations, which align model weights with inputs to enable faithful mappings of latent features back to the input space, facilitating local explanations in addition to global explanations of case-based reasoning. We evaluate WASUP on three tasks: fine-grained classification on Stanford Dogs, multi-label classification on Pascal VOC, and pathology detection on the RSNA dataset. Results indicate that WASUP not only achieves competitive accuracy compared to state-of-the-art black-box models but also offers insightful explanations verified through theoretical analysis. Our findings underscore WASUP’s potential for applications where understanding model decisions is as critical as the decisions themselves.

MCML Authors
Link to website

Tom Nuno Wolf

Artificial Intelligence in Radiology

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[1275]
Z. Yang, M. Song, X. Jing, H. Zhang, K. Qian, B. Hu, K. Tamada, T. Takumi, B. W. Schuller and Y. Yamamoto.
MADUV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge.
Preprint (Jan. 2025). arXiv
Abstract

The Mice Autism Detection via Ultrasound Vocalization (MADUV) Challenge introduces the first INTERSPEECH challenge focused on detecting autism spectrum disorder (ASD) in mice through their vocalizations. Participants are tasked with developing models to automatically classify mice as either wild-type or ASD models based on recordings with a high sampling rate. Our baseline system employs a simple CNN-based classification using three different spectrogram features. Results demonstrate the feasibility of automated ASD detection, with the considered audible-range features achieving the best performance (UAR of 0.600 for segment-level and 0.625 for subject-level classification). This challenge bridges speech technology and biomedical research, offering opportunities to advance our understanding of ASD models through machine learning approaches. The findings suggest promising directions for vocalization analysis and highlight the potential value of audible and ultrasound vocalizations in ASD detection.

MCML Authors
Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


2024


[1274]
K. Bieker, H. T. Kussaba, P. Scholl, J. Jung, A. Swikir, S. Haddadin and G. Kutyniok.
Compositional Construction of Barrier Functions for Switched Impulsive Systems.
CDC 2024 - 63rd IEEE Conference on Decision and Control. Milan, Italy, Dec 16-19, 2024. To be published. Preprint available. arXiv
Abstract

Many systems occurring in real-world applications, such as controlling the motions of robots or modeling the spread of diseases, are switched impulsive systems. To ensure that the system state stays in a safe region (e.g., to avoid collisions with obstacles), barrier functions are widely utilized. As the system dimension increases, deriving suitable barrier functions becomes extremely complex. Fortunately, many systems consist of multiple subsystems, such as different areas where the disease occurs. In this work, we present sufficient conditions for interconnected switched impulsive systems to maintain safety by constructing local barrier functions for the individual subsystems instead of a global one, allowing for much easier and more efficient derivation. To validate our results, we numerically demonstrate its effectiveness using an epidemiological model.

MCML Authors
Link to website

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Link to Profile Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[1273]
A. Koebler, T. Decker, I. Thon, V. Tresp and F. Buettner.
Incremental Uncertainty-aware Performance Monitoring with Labeling Intervention.
BDU @NeurIPS 2024 - Workshop Bayesian Decision-making and Uncertainty: from probabilistic and spatiotemporal modeling to sequential experiment design at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. URL
Abstract

We study the problem of monitoring machine learning models under temporal distribution shifts, where circumstances change gradually over time, often leading to unnoticed yet significant declines in accuracy. We propose Incremental Uncertainty-aware Performance Monitoring (IUPM), a novel label-free method that estimates model performance by modeling time-dependent shifts using optimal transport. IUPM also quantifies uncertainty in performance estimates and introduces an active labeling strategy to reduce this uncertainty. We further showcase the benefits of IUPM on different datasets and simulated temporal shifts over existing baselines.

MCML Authors
Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1272]
B. Cong, N. Daheim, Y. Shen, D. Cremers, R. Yokota, M. Khan and T. Möllenhoff.
Variational Low-Rank Adaptation Using IVON.
FITML @NeurIPS 2024 - Workshop Fine-Tuning in Modern Machine Learning: Principles and Scalability at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv GitHub
Abstract

We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. The accuracy is also better than the other Bayesian alternatives, yet the cost is lower and the implementation is easier. Our work provides additional evidence for the effectiveness of IVON for large language models.

MCML Authors
Link to website

Yuesong Shen

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1271]
M. Kollovieh, B. Charpentier, D. Zügner and S. Günnemann.
Expected Probabilistic Hierarchies.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published.
Abstract

Hierarchical clustering has usually been addressed by discrete optimization using heuristics or continuous optimization of relaxed scores for hierarchies. In this work, we propose to optimize expected scores under a probabilistic model over hierarchies. (1) We show theoretically that the global optimal values of the expected Dasgupta cost and Tree-Sampling divergence (TSD), two unsupervised metrics for hierarchical clustering, are equal to the optimal values of their discrete counterparts contrary to some relaxed scores. (2) We propose Expected Probabilistic Hierarchies (EPH), a probabilistic model to learn hierarchies in data by optimizing expected scores. EPH uses differentiable hierarchy sampling enabling end-to-end gradient descent based optimization, and an unbiased subgraph sampling approach to scale to large datasets. (3) We evaluate EPH on synthetic and real-world datasets including vector and graph datasets. EPH outperforms all other approaches quantitatively and provides meaningful hierarchies in qualitative evaluations.

MCML Authors
Link to website

Marcel Kollovieh

Data Analytics & Machine Learning

Link to Profile Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[1270]
V. Melnychuk, S. Feuerriegel and M. van der Schaar.
Quantifying Aleatoric Uncertainty of the Treatment Effect: A Novel Orthogonal Learner.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published.
Abstract

Estimating causal quantities from observational data is crucial for understanding the safety and effectiveness of medical treatments. However, to make reliable inferences, medical practitioners require not only estimating averaged causal quantities, such as the conditional average treatment effect, but also understanding the randomness of the treatment effect as a random variable. This randomness is referred to as aleatoric uncertainty and is necessary for understanding the probability of benefit from treatment or quantiles of the treatment effect. Yet, the aleatoric uncertainty of the treatment effect has received surprisingly little attention in the causal machine learning community. To fill this gap, we aim to quantify the aleatoric uncertainty of the treatment effect at the individualized (covariate-conditional) level, namely, the conditional distribution of the treatment effect (CDTE). Unlike average causal quantities, the CDTE is not point identifiable without strong additional assumptions. As a remedy, we employ partial identification to obtain sharp bounds on the CDTE and thereby quantify the aleatoric uncertainty of the treatment effect. We then develop a novel, orthogonal learner for the bounds on the CDTE, which we call AU-learner. We further show that our AU-learner has several strengths in that it satisfies Neyman-orthogonality and is doubly robust. Finally, we propose a fully-parametric deep learning instantiation of our AU-learner.

MCML Authors
Link to website

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1269]
E. Ailer, N. Dern, J. Hartford and N. Kilbertus.
Targeted Sequential Indirect Experiment Design.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

Scientific hypotheses typically concern specific aspects of complex, imperfectly understood or entirely unknown mechanisms, such as the effect of gene expression levels on phenotypes or how microbial communities influence environmental health. Such queries are inherently causal (rather than purely associational), but in many settings, experiments can not be conducted directly on the target variables of interest, but are indirect. Therefore, they perturb the target variable, but do not remove potential confounding factors. If, additionally, the resulting experimental measurements are multi-dimensional and the studied mechanisms nonlinear, the query of interest is generally not identified. We develop an adaptive strategy to design indirect experiments that optimally inform a targeted query about the ground truth mechanism in terms of sequentially narrowing the gap between an upper and lower bound on the query. While the general formulation consists of a bi-level optimization procedure, we derive an efficiently estimable analytical kernel-based estimator of the bounds for the causal effect, a query of key interest, and demonstrate the efficacy of our approach in confounded, multivariate, nonlinear synthetic settings.

MCML Authors
Link to website

Elisabeth Ailer

Ethics in Systems Design and Machine Learning

Link to Profile Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[1268]
R. Dhahri, A. Immer, B. Charpentier, S. Günnemann and V. Fortuin.
Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

Neural network sparsification is a promising avenue to save computational time and memory costs, especially in an age where many successful AI models are becoming too large to naïvely deploy on consumer hardware. While much work has focused on different weight pruning criteria, the overall sparsifiability of the network, i.e., its capacity to be pruned without quality loss, has often been overlooked. We present Sparsifiability via the Marginal likelihood (SpaM), a pruning framework that highlights the effectiveness of using the Bayesian marginal likelihood in conjunction with sparsity-inducing priors for making neural networks more sparsifiable. Our approach implements an automatic Occam’s razor that selects the most sparsifiable model that still explains the data well, both for structured and unstructured sparsification. In addition, we demonstrate that the pre-computed posterior Hessian approximation used in the Laplace approximation can be re-used to define a cheap pruning criterion, which outperforms many existing (more expensive) approaches. We demonstrate the effectiveness of our framework, especially at high sparsity levels, across a range of different neural network architectures and datasets.

MCML Authors
Link to Profile Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Link to Profile Vincent Fortuin

Vincent Fortuin

Dr.

Bayesian Deep Learning


[1267]
L. Eyring, S. Karthik, K. Roth, A. Dosovitskiy and Z. Akata.
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv GitHub
Abstract

Text-to-Image (T2I) models have made significant advancements in recent years, but they still struggle to accurately capture intricate details specified in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from ‘reward hacking’ and may not generalize well to unseen prompt distributions. In this work, we propose Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on the signal from one or multiple human preference reward models. Remarkably, solving this optimization problem with gradient ascent for 50 iterations yields impressive results on four different one-step models across two competitive benchmarks, T2I-CompBench and GenEval. Within a computational budget of 20-50 seconds, ReNO-enhanced one-step models consistently surpass the performance of all current open-source Text-to-Image models. Extensive user studies demonstrate that our model is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. Moreover, given the same computational resources, a ReNO-optimized one-step model outperforms widely-used open-source models such as SDXL and PixArt-α, highlighting the efficiency and effectiveness of ReNO in enhancing T2I model performance at inference time.

MCML Authors
Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1266]
F. Hoppe, C. M. Verdun, H. Laus, F. Krahmer and H. Rauhut.
Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. Our work rigorously addresses this issue and derives a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, our analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings bridge the gap between established theory and the practical applicability of such debiased methods.

MCML Authors
Link to website

Claudio Mayrink Verdun

Dr.

* Former member

Link to website

Hannah Laus

Optimization & Data Analysis

Link to Profile Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Profile Holger Rauhut

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence


[1265]
A. Javanmardi, D. Stutz and E. Hüllermeier.
Conformalized Credal Set Predictors.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example.

MCML Authors
Link to website

Alireza Javanmardi

Artificial Intelligence & Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[1264]
A. H. Kargaran, F. Yvon and H. Schütze.
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

The need for large text corpora has increased with the advent of pretrained language models and, in particular, the discovery of scaling laws for these models. Most available corpora have sufficient data only for languages with large dominant communities. However, there is no corpus available that (i) covers a wide range of minority languages; (ii) is generated by an open-source reproducible pipeline; and (iii) is rigorously cleaned from noise, making it trustworthy to use. We present GlotCC, a clean, document-level, 2TB general domain corpus derived from CommonCrawl, covering more than 1000 languages. We make GlotCC and the system used to generate it - including the pipeline, language identification model, and filters - available to the research community.

MCML Authors
Link to website

Amir Hossein Kargaran

Statistical NLP and Deep Learning

Link to Profile Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1263]
F. Koehler, S. Niedermayr, R. Westermann and N. Thuerey.
APEBench: A Benchmark for Autoregressive Neural Emulators of PDEs.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

We introduce the Autoregressive PDE Emulator Benchmark (APEBench), a comprehensive benchmark suite to evaluate autoregressive neural emulators for solving partial differential equations. APEBench is based on JAX and provides a seamlessly integrated differentiable simulation framework employing efficient pseudo-spectral methods, enabling 46 distinct PDEs across 1D, 2D, and 3D. Facilitating systematic analysis and comparison of learned emulators, we propose a novel taxonomy for unrolled training and introduce a unique identifier for PDE dynamics that directly relates to the stability criteria of classical numerical methods. APEBench enables the evaluation of diverse neural architectures, and unlike existing benchmarks, its tight integration of the solver enables support for differentiable physics training and neural-hybrid emulators. Moreover, APEBench emphasizes rollout metrics to understand temporal generalization, providing insights into the long-term behavior of emulating PDE dynamics. In several experiments, we highlight the similarities between neural emulators and numerical simulators.

MCML Authors
Link to Profile Rüdiger Westermann

Rüdiger Westermann

Prof. Dr.

Computer Graphics & Visualization

Link to Profile Nils Thuerey

Nils Thuerey

Prof. Dr.

Physics-based Simulation


[1262]
G. Ma, Y. Wang, D. Lim, S. Jegelka and Y. Wang.
A Canonicalization Perspective on Invariant and Equivariant Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv GitHub
Abstract

In many applications, we desire neural networks to exhibit invariance or equivariance to certain groups due to symmetries inherent in the data. Recently, frame-averaging methods emerged to be a unified framework for attaining symmetries efficiently by averaging over input-dependent subsets of the group, i.e., frames. What we currently lack is a principled understanding of the design of frames. In this work, we introduce a canonicalization perspective that provides an essential and complete view of the design of frames. Canonicalization is a classic approach for attaining invariance by mapping inputs to their canonical forms. We show that there exists an inherent connection between frames and canonical forms. Leveraging this connection, we can efficiently compare the complexity of frames as well as determine the optimality of certain frames. Guided by this principle, we design novel frames for eigenvectors that are strictly superior to existing methods – some are even optimal – both theoretically and empirically. The reduction to the canonicalization perspective further uncovers equivalences between previous methods. These observations suggest that canonicalization provides a fundamental understanding of existing frame-averaging methods and unifies existing equivariant and invariant learning methods.

MCML Authors
Link to Profile Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1261]
Y. Ma, V. Melnychuk, J. Schweisthal and S. Feuerriegel.
DiffPO: A causal diffusion model for learning distributions of potential outcomes.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

Predicting potential outcomes of interventions from observational data is crucial for decision-making in medicine, but the task is challenging due to the fundamental problem of causal inference. Existing methods are largely limited to point estimates of potential outcomes with no uncertain quantification; thus, the full information about the distributions of potential outcomes is typically ignored. In this paper, we propose a novel causal diffusion model called DiffPO, which is carefully designed for reliable inferences in medicine by learning the distribution of potential outcomes. In our DiffPO, we leverage a tailored conditional denoising diffusion model to learn complex distributions, where we address the selection bias through a novel orthogonal diffusion loss. Another strength of our DiffPO method is that it is highly flexible (e.g., it can also be used to estimate different causal quantities such as CATE). Across a wide range of experiments, we show that our method achieves state-of-the-art performance.

MCML Authors
Link to website

Yuchen Ma

Artificial Intelligence in Management

Link to website

Valentyn Melnychuk

Artificial Intelligence in Management

Link to website

Jonas Schweisthal

Artificial Intelligence in Management

Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1260]
M. Muschalik, H. Baniecki, F. Fumagalli, P. Kolpaczki, B. Hammer and E. Hüllermeier.
shapiq: Shapley Interactions for Machine Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to website

Patrick Kolpaczki

Artificial Intelligence & Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[1259]
T. Nagler, L. Schneider, B. Bischl and M. Feurer.
Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv GitHub
Abstract

Hyperparameter optimization is crucial for obtaining peak performance of machine learning models. The standard protocol evaluates various hyperparameter configurations using a resampling estimate of the generalization error to guide optimization and select a final hyperparameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model’s generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment. While reshuffling leads to test performances that are competitive with using fixed splits, it drastically improves results for a single train-validation holdout protocol and can often make holdout become competitive with standard CV while being computationally cheaper.

MCML Authors
Link to Profile Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Link to website

Lennart Schneider

Statistical Learning & Data Science

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Profile Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science


[1258]
R. Paolino, S. Maskey, P. Welke and G. Kutyniok.
Weisfeiler and Leman Go Loopy: A New Hierarchy for Graph Representational Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv GitHub
Abstract

We introduce r-loopy Weisfeiler-Leman (r-ℓWL), a novel hierarchy of graph isomorphism tests and a corresponding GNN framework, r-ℓMPNN, that can count cycles up to length r+2. Most notably, we show that r-ℓWL can count homomorphisms of cactus graphs. This strictly extends classical 1-WL, which can only count homomorphisms of trees and, in fact, is incomparable to k-WL for any fixed k. We empirically validate the expressive and counting power of the proposed r-ℓMPNN on several synthetic datasets and present state-of-the-art predictive performance on various real-world datasets.

MCML Authors
Link to website

Raffaele Paolino

Mathematical Foundations of Artificial Intelligence

Link to website

Sohir Maskey

Mathematical Foundations of Artificial Intelligence

Link to Profile Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[1257]
K. Roth, V. Udandarao, S. Dziadzio, A. Prabhu, M. Cherti, O. Vinyals, O. Hénaff, S. Albanie, M. Bethge and Z. Akata.
A Practitioner's Guide to Continual Multimodal Pretraining.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv GitHub
Abstract

Multimodal foundation models serve numerous applications at the intersection of vision and language. Still, despite being pretrained on extensive data, they become outdated over time. To keep models updated, research into continual pretraining mainly explores scenarios with either (1) infrequent, indiscriminate updates on large-scale new data, or (2) frequent, sample-level updates. However, practical model deployment often operates in the gap between these two limit cases, as real-world applications often demand adaptation to specific subdomains, tasks or concepts – spread over the entire, varying life cycle of a model. In this work, we complement current perspectives on continual pretraining through a research test bed as well as provide comprehensive guidance for effective continual model updates in such scenarios. We first introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements, constructed over 63 datasets with diverse visual and semantic coverage. Using FoMo-in-Flux, we explore the complex landscape of practical continual pretraining through multiple perspectives: (1) A data-centric investigation of data mixtures and stream orderings that emulate real-world deployment situations, (2) a method-centric investigation ranging from simple fine-tuning and traditional continual learning strategies to parameter-efficient updates and model merging, (3) meta learning rate schedules and mechanistic design choices, and (4) the influence of model and compute scaling. Together, our insights provide a practitioner’s guide to continual multimodal pretraining for real-world deployment.

MCML Authors
Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1256]
D. Rügamer, B. X. W. Liew, Z. Altai and A. Stöcker.
A Functional Extension of Semi-Structured Networks.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

Semi-structured networks (SSNs) merge the structures familiar from additive models with deep neural networks, allowing the modeling of interpretable partial feature effects while capturing higher-order non-linearities at the same time. A significant challenge in this integration is maintaining the interpretability of the additive model component. Inspired by large-scale biomechanics datasets, this paper explores extending SSNs to functional data. Existing methods in functional data analysis are promising but often not expressive enough to account for all interactions and non-linearities and do not scale well to large datasets. Although the SSN approach presents a compelling potential solution, its adaptation to functional data remains complex. In this work, we propose a functional SSN method that retains the advantageous properties of classical functional regression approaches while also improving scalability. Our numerical experiments demonstrate that this approach accurately recovers underlying signals, enhances predictive performance, and performs favorably compared to competing methods.

MCML Authors
Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[1255]
R. Stolz, H. Krasowski, J. Thumm, M. Eichelbeck, P. Gassert and M. Althoff.
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

Continuous action spaces in reinforcement learning (RL) are commonly defined as multidimensional intervals. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using proximal policy optimization (PPO), we evaluate our methods on four control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.

MCML Authors
Link to website

Hanna Krasowski

Dr.

* Former member

Link to website

Michael Eichelbeck

Cyber Physical Systems

Link to website

Philipp Gassert

Cyber Physical Systems

Link to Profile Matthias Althoff

Matthias Althoff

Prof. Dr.

Cyber Physical Systems


[1254]
J. Wang, M. Ghahremani, Y. Li, B. Ommer and C. Wachinger.
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv GitHub
Abstract

Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions. Current methods, however, exhibit limited performance when guided by skeleton human poses, especially in complex pose conditions such as side or rear perspectives of human figures. To address this issue, we present Stable-Pose, a novel adapter model that introduces a coarse-to-fine attention masking strategy into a vision Transformer (ViT) to gain accurate pose guidance for T2I models. Stable-Pose is designed to adeptly handle pose conditions within pre-trained Stable Diffusion, providing a refined and efficient way of aligning pose representation during image synthesis. We leverage the query-key self-attention mechanism of ViTs to explore the interconnections among different anatomical parts in human pose skeletons. Masked pose images are used to smoothly refine the attention maps based on target pose-related features in a hierarchical manner, transitioning from coarse to fine levels. Additionally, our loss function is formulated to allocate increased emphasis to the pose region, thereby augmenting the model’s precision in capturing intricate pose details. We assessed the performance of Stable-Pose across five public datasets under a wide range of indoor and outdoor human pose scenarios. Stable-Pose achieved an AP score of 57.1 in the LAION-Human dataset, marking around 13% improvement over the established technique ControlNet.

MCML Authors
Link to website

Morteza Ghahremani

Dr.

Artificial Intelligence in Radiology

Link to website

Yitong Li

Artificial Intelligence in Radiology

Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[1253]
Y. Wang, K. Hu, S. Gupta, Z. Ye, Y. Wang and S. Jegelka.
Understanding the Role of Equivariance in Self-supervised Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv GitHub
Abstract

Contrastive learning has been a leading paradigm for self-supervised learning, but it is widely observed that it comes at the price of sacrificing useful features (eg colors) by being invariant to data augmentations. Given this limitation, there has been a surge of interest in equivariant self-supervised learning (E-SSL) that learns features to be augmentation-aware. However, even for the simplest rotation prediction method, there is a lack of rigorous understanding of why, when, and how E-SSL learns useful features for downstream tasks. To bridge this gap between practice and theory, we establish an information-theoretic perspective to understand the generalization ability of E-SSL. In particular, we identify a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks. This synergy effect encourages models to extract class-relevant features to improve its equivariant prediction, which, in turn, benefits downstream tasks requiring semantic features. Based on this perspective, we theoretically analyze the influence of data transformations and reveal several principles for practical designs of E-SSL. Our theory not only aligns well with existing E-SSL methods but also sheds light on new directions by exploring the benefits of model equivariance. We believe that a theoretically grounded understanding on the role of equivariance would inspire more principled and advanced designs in this field.

MCML Authors
Link to Profile Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1252]
Y. Wang, Y. Wu, Z. Wei, S. Jegelka and Y. Wang.
A Theoretical Understanding of Self-Correction through In-context Alignment.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

Going beyond mimicking limited human experiences, recent studies show initial evidence that, like humans, large language models (LLMs) are capable of improving their abilities purely by self-correction, i.e., correcting previous responses through self-examination, in certain circumstances. Nevertheless, little is known about how such capabilities arise. In this work, based on a simplified setup akin to an alignment task, we theoretically analyze self-correction from an in-context learning perspective, showing that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way. Notably, going beyond previous theories on over-simplified linear transformers, our theoretical construction underpins the roles of several key designs of realistic transformers for self-correction: softmax attention, multi-head attention, and the MLP block. We validate these findings extensively on synthetic datasets. Inspired by these findings, we also illustrate novel applications of self-correction, such as defending against LLM jailbreaks, where a simple self-correction step does make a large difference. We believe that these findings will inspire further research on understanding, exploiting, and enhancing self-correction for building better foundation models.

MCML Authors
Link to Profile Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1251]
D. Winkel, N. Strauß, M. Bernhard, Z. Li, T. Seidl and M. Schubert.
Autoregressive Policy Optimization for Constrained Allocation Tasks.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv GitHub
Abstract

Allocation tasks represent a class of problems where a limited amount of resources must be allocated to a set of entities at each time step. Prominent examples of this task include portfolio optimization or distributing computational workloads across servers. Allocation tasks are typically bound by linear constraints describing practical requirements that have to be strictly fulfilled at all times. In portfolio optimization, for example, investors may be obligated to allocate less than 30% of the funds into a certain industrial sector in any investment period. Such constraints restrict the action space of allowed allocations in intricate ways, which makes learning a policy that avoids constraint violations difficult. In this paper, we propose a new method for constrained allocation tasks based on an autoregressive process to sequentially sample allocations for each entity. In addition, we introduce a novel de-biasing mechanism to counter the initial bias caused by sequential sampling. We demonstrate the superior performance of our approach compared to a variety of Constrained Reinforcement Learning (CRL) methods on three distinct constrained allocation tasks: portfolio optimization, computational workload distribution, and a synthetic allocation benchmark.

MCML Authors
Link to website

David Winkel

Database Systems & Data Mining

Link to website

Niklas Strauß

Database Systems & Data Mining

Link to website

Maximilian Bernhard

Database Systems & Data Mining

Link to website

Zongyue Li

Database Systems & Data Mining

Link to Profile Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining

Link to Profile Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[1250]
M. Yau, N. Karalias, E. Lu, J. Xu and S. Jegelka.
Are Graph Neural Networks Optimal Approximation Algorithms?.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

In this work we design graph neural network architectures that capture optimal approximation algorithms for a large class of combinatorial optimization problems, using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message-passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problems assuming the Unique Games Conjecture. We leverage this result to construct efficient graph neural network architectures, OptGNN, that obtain high-quality approximate solutions on landmark combinatorial optimization problems such as Max-Cut, Min-Vertex-Cover, and Max-3-SAT. Our approach achieves strong empirical results across a wide range of real-world and synthetic datasets against solvers and neural baselines. Finally, we take advantage of OptGNN’s ability to capture convex relaxations to design an algorithm for producing bounds on the optimal solution from the learned embeddings of OptGNN.

MCML Authors
Link to Profile Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1249]
Y. Zhang, Y. Li, X. Wang, Q. Shen, B. Plank, B. Bischl, M. Rezaei and K. Kawaguchi.
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models.
NeurIPS 2024 - Workshop on Machine Learning and Compression at the 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. arXiv
Abstract

Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To address these issues, we propose FinerCut, a new form of fine-grained layer pruning, which in contrast to prior work at the transformer block level, considers all self-attention and feed-forward network (FFN) layers within blocks as individual pruning candidates. FinerCut prunes layers whose removal causes minimal alternation to the model’s output – contributing to a new, lean, interpretable, and task-agnostic pruning method. Tested across 9 benchmarks, our approach retains 90% performance of Llama3-8B with 25% layers removed, and 95% performance of Llama3-70B with 30% layers removed, all without fine-tuning or post-pruning reconstruction. Strikingly, we observe intriguing results with FinerCut: 42% (34 out of 80) of the self-attention layers in Llama3-70B can be removed while preserving 99% of its performance – without additional fine-tuning after removal. Moreover, FinerCut provides a tool to inspect the types and locations of pruned layers, allowing to observe interesting pruning behaviors. For instance, we observe a preference for pruning self-attention layers, often at deeper consecutive decoder layers. We hope our insights inspire future efficient LLM architecture designs.

MCML Authors
Link to website

Yawei Li

Statistical Learning & Data Science

Link to website

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to website

Mina Rezaei

Dr.

Statistical Learning & Data Science


[1248]
M. Koshil, T. Nagler, M. Feurer and K. Eggensperger.
Towards Localization via Data Embedding for TabPFN.
TLR @NeurIPS 2024 - 3rd Table Representation Learning Workshop at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint available. URL
Abstract

Prior-data fitted networks (PFNs), especially TabPFN, have shown significant promise in tabular data prediction. However, their scalability is limited by the quadratic complexity of the transformer architecture’s attention across training points. In this work, we propose a method to localize TabPFN, which embeds data points into a learned representation and performs nearest neighbor selection in this space. We evaluate it across six datasets, demonstrating its superior performance over standard TabPFN when scaling to larger datasets. We also explore its design choices and analyze the bias-variance trade-off of this localization method, showing that it reduces bias while maintaining manageable variance. This work opens up a pathway for scaling TabPFN to arbitrarily large tabular datasets.

MCML Authors
Link to Profile Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Link to Profile Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science


[1247]
C. Leiber, N. Strauß, M. Schubert and T. Seidl.
Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters.
DLC @ICDM 2024 - 6th Workshop on Deep Learning and Clustering at the 24th IEEE International Conference on Data Mining (ICDM 2024). Abu Dhabi, United Arab Emirates, Dec 09-12, 2024. To be published. Preprint available. arXiv GitHub
Abstract

Finding meaningful groups, i.e., clusters, in high-dimensional data such as images or texts without labeled data at hand is an important challenge in data mining. In recent years, deep clustering methods have achieved remarkable results in these tasks. However, most of these methods require the user to specify the number of clusters in advance. This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable. Thus, an area of research has emerged that addresses this problem. Most of these approaches estimate the number of clusters separated from the clustering process. This results in a strong dependency of the clustering result on the quality of the initial embedding. Other approaches are tailored to specific clustering processes, making them hard to adapt to other scenarios. In this paper, we propose UNSEEN, a general framework that, starting from a given upper bound, is able to estimate the number of clusters. To the best of our knowledge, it is the first method that can be easily combined with various deep clustering algorithms. We demonstrate the applicability of our approach by combining UNSEEN with the popular deep clustering algorithms DCN, DEC, and DKM and verify its effectiveness through an extensive experimental evaluation on several image and tabular datasets. Moreover, we perform numerous ablations to analyze our approach and show the importance of its components.

MCML Authors
Collin Leiber

Collin Leiber

* Former member

Link to website

Niklas Strauß

Database Systems & Data Mining

Link to Profile Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Profile Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[1246]
U. Fischer Abaigar, C. Kern, N. Barda and F. Kreuter.
Bridging the gap: Towards an expanded toolkit for AI-driven decision-making in the public sector.
Government Information Quarterly 41.4 (Dec. 2024). DOI
Abstract

AI-driven decision-making systems are becoming instrumental in the public sector, with applications spanning areas like criminal justice, social welfare, financial fraud detection, and public health. While these systems offer great potential benefits to institutional decision-making processes, such as improved efficiency and reliability, these systems face the challenge of aligning machine learning (ML) models with the complex realities of public sector decision-making. In this paper, we examine five key challenges where misalignment can occur, including distribution shifts, label bias, the influence of past decision-making on the data side, as well as competing objectives and human-in-the-loop on the model output side. Our findings suggest that standard ML methods often rely on assumptions that do not fully account for these complexities, potentially leading to unreliable and harmful predictions. To address this, we propose a shift in modeling efforts from focusing solely on predictive accuracy to improving decision-making outcomes. We offer guidance for selecting appropriate modeling frameworks, including counterfactual prediction and policy learning, by considering how the model estimand connects to the decision-maker’s utility. Additionally, we outline technical methods that address specific challenges within each modeling approach. Finally, we argue for the importance of external input from domain experts and stakeholders to ensure that model assumptions and design choices align with real-world policy objectives, taking a step towards harmonizing AI and public sector objectives.

MCML Authors
Link to website

Unai Fischer Abaigar

Social Data Science and AI Lab

Link to Profile Christoph Kern

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Link to Profile Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[1245]
L. Bothmann and K. Peters.
Fairness von KI – ein Brückenschlag zwischen Philosophie und Maschinellem Lernen.
Grenzen Künstlicher Intelligenz (Dec. 2024).
MCML Authors
Link to website

Ludwig Bothmann

Dr.

Statistical Learning & Data Science


[1244]
T. Hannan, R. Koner, M. Bernhard, S. Shit, B. Menze, V. Tresp, M. Schubert and T. Seidl.
GRAtt-VIS: Gated Residual Attention for Video Instance Segmentation.
ICPR 2024 - 27th International Conference on Pattern Recognition. Kolkata, India, Dec 01-05, 2024. DOI GitHub
Abstract

Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic memory attention. However, they are susceptible to the degradation of instance features due to the above-mentioned challenges and suffer from cascading effects. The detection and rectification of such errors remain largely underexplored. To this end, we introduce textbf{GRAtt-VIS}, textbf{G}ated textbf{R}esidual textbf{Att}ention for textbf{V}ideo textbf{I}nstance textbf{S}egmentation. Firstly, we leverage a Gumbel-Softmax-based gate to detect possible errors in the current frame. Next, based on the gate activation, we rectify degraded features from its past representation. Such a residual configuration alleviates the need for dedicated memory and provides a continuous stream of relevant instance features. Secondly, we propose a novel inter-instance interaction using gate activation as a mask for self-attention. This masking strategy dynamically restricts the unrepresentative instance queries in the self-attention and preserves vital information for long-term tracking. We refer to this novel combination of Gated Residual Connection and Masked Self-Attention as textbf{GRAtt} block, which can easily be integrated into the existing propagation-based framework. Further, GRAtt blocks significantly reduce the attention overhead and simplify dynamic temporal modeling. GRAtt-VIS achieves state-of-the-art performance on YouTube-VIS and the highly challenging OVIS dataset, significantly improving over previous methods.

MCML Authors
Link to website

Tanveer Hannan

Database Systems & Data Mining

Link to website

Rajat Koner

Database Systems & Data Mining

Link to website

Maximilian Bernhard

Database Systems & Data Mining

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining

Link to Profile Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Profile Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[1243]
Y. N. Böck, H. Boche, F. H. P. Fitzek and G. Kutyniok.
Computing-Model and Computing-Hardware Selection for ICT Under Societal and Judicial Constraints.
IEEE Access 12 (Dec. 2024). DOI
Abstract

This article discusses a formalization of aspects of Cyber-Sovereignty (CyS) for information and communication technology (ICT), linking them to technological trustworthiness and deriving an associated paradigm for hard- and software design. The upcoming 6G ICT standard is considered a keystone within modern society’s increasing interconnectedness and automatization, as it provides the necessary technological infrastructure for applications such as the Metaverse or large-scale digital twinning. Since emerging technological systems increasingly affect sensitive human goods, hard- and software manufacturers must consider a new dimension of societal and judicial constraints in the context of technological trustworthiness. This article aims to establish a formalized theory of specific aspects of CyS, providing a paradigm for hard- and software engineering in ICT. This paradigm is directly applicable in formal technology assessment and ensures that the relevant facets of CyS – specifically, the principle of Algorithmic Transparency (AgT) – are satisfied. The framework follows an axiomatic approach. Particularly, the formal basis of our theory consists of four fundamental assumptions about the general nature of physical problems and algorithmic implementations. This formal basis allows for drawing general conclusions on the relation between CyS and technological trustworthiness and entails a formal meta-thesis on AgT in digital computing.

MCML Authors
Link to Profile Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[1242]
A. Höhl, I. Obadic, M.-Á. Fernández-Torres, H. Najjar, D. Oliveira, Z. Akata, A. Dengel and X. Zhu.
Opening the Black Box: A systematic review on explainable artificial intelligence in remote sensing.
IEEE Geoscience and Remote Sensing Magazine 12.4 (Dec. 2024). DOI
Abstract

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the explainable AI methods used and their objectives, findings, and challenges in remote sensing applications is still missing. In this paper, we address this gap by performing a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches and emerging directions that tackle specific remote sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights, and reflect on the approaches used for the evaluation of explainable AI methods. As such, our review provides a complete summary of the state-of-the-art of explainable AI in remote sensing. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field.

MCML Authors
Link to website

Adrian Höhl

Data Science in Earth Observation

Link to website

Ivica Obadic

Data Science in Earth Observation

Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

Link to Profile Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1241]
S. Zhao, Z. Chen, Z. Xiong, Y. Shi, S. Saha and X. Zhu.
Beyond Grid Data: Exploring graph neural networks for Earth observation.
IEEE Geoscience and Remote Sensing Magazine Early access (Dec. 2024). DOI
Abstract

Earth Observation (EO) data analysis has been significantly revolutionized by deep learning (DL), with applications typically limited to grid-like data structures. Graph Neural Networks (GNNs) emerge as an important innovation, propelling DL into the non-Euclidean domain. Naturally, GNNs can effectively tackle the challenges posed by diverse modalities, multiple sensors, and the heterogeneous nature of EO data. To introduce GNNs in the related domains, our review begins by offering fundamental knowledge on GNNs. Then, we summarize the generic problems in EO, to which GNNs can offer potential solutions. Following this, we explore a broad spectrum of GNNs’ applications to scientific problems in Earth systems, covering areas such as weather and climate analysis, disaster management, air quality monitoring, agriculture, land cover classification, hydrological process modeling, and urban modeling. The rationale behind adopting GNNs in these fields is explained, alongside methodologies for organizing graphs and designing favorable architectures for various tasks. Furthermore, we highlight methodological challenges of implementing GNNs in these domains and possible solutions that could guide future research. While acknowledging that GNNs are not a universal solution, we conclude the paper by comparing them with other popular architectures like transformers and analyzing their potential synergies.

MCML Authors
Link to website

Zhaiyu Chen

Data Science in Earth Observation

Link to Profile Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1240]
Q. Sun, A. Akman, X. Jing, M. Milling and B. W. Schuller.
Audio-based Kinship Verification Using Age Domain Conversion.
IEEE Signal Processing Letters 32 (Dec. 2024). DOI
Abstract

Audio-based kinship verification (AKV) is important in many domains, such as home security monitoring, forensic identification, and social network analysis. A key challenge in the task arises from differences in age across samples from different individuals, which can be interpreted as a domain bias in a cross-domain verification task. To address this issue, we design the notion of an ‘age-standardised domain’ wherein we utilise the optimised CycleGAN-VC3 network to perform age-audio conversion to generate the in-domain audio. The generated audio dataset is employed to extract a range of features, which are then fed into a metric learning architecture to verify kinship. Experiments are conducted on the KAN_AV audio dataset, which contains age and kinship labels. The results demonstrate that the method markedly enhances the accuracy of kinship verification, while also offering novel insights for future kinship verification research.

MCML Authors
Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1239]
J. Herbinger, M. N. Wright, T. Nagler, B. Bischl and G. Casalicchio.
Decomposing Global Feature Effects Based on Feature Interactions.
Journal of Machine Learning Research 25.381 (Dec. 2024). URL
Abstract

Global feature effect methods, such as partial dependence plots, provide an intelligible visualization of the expected marginal feature effect. However, such global feature effect methods can be misleading, as they do not represent local feature effects of single observations well when feature interactions are present. We formally introduce generalized additive decomposition of global effects (GADGET), which is a new framework based on recursive partitioning to find interpretable regions in the feature space such that the interaction-related heterogeneity of local feature effects is minimized. We provide a mathematical foundation of the framework and show that it is applicable to the most popular methods to visualize marginal feature effects, namely partial dependence, accumulated local effects, and Shapley additive explanations (SHAP) dependence. Furthermore, we introduce and validate a new permutation-based interaction detection procedure that is applicable to any feature effect method that fits into our proposed framework. We empirically evaluate the theoretical characteristics of the proposed methods based on various feature effect methods in different experimental settings. Moreover, we apply our introduced methodology to three real-world examples to showcase their usefulness.

MCML Authors
Link to Profile Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to website

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[1238]
H. Weingärtner, M. Windl, L. L. Chuang and F. Draxler.
Useful but Distracting: Viewer Experience with Keyword Highlights and Time-Synchronization in Captions for Language Learning.
MUM 2024 - 23rd International Conference on Mobile and Ubiquitous Multimedia. Stockholm, Sweden, Dec 01-04, 2024. DOI
Abstract

Captions are a valuable scaffold for language learners, aiding comprehension and vocabulary acquisition. Past work has proposed enhancements such as keyword highlights for increased learning gains. However, little is known about learners’ experience with enhanced captions, although this is critical for adoption in everyday life. We conducted a survey and focus group to elicit learner preferences and requirements and implemented a processing pipeline for enhanced captions with keyword highlights, time-synchronized keyword highlights, and keyword captions. A subsequent online study (n = 66) showed that time-synchronized keyword highlights were the preferred design for learning but were perceived as too distracting to replace standard captions in everyday viewing scenarios. We conclude that keyword highlights and time-synchronization are suitable for integrating learning into an entertaining everyday- life activity, but the design should be optimized to provide a more seamless experience.

MCML Authors
Link to website

Maximiliane Windl

Human-Centered Ubiquitous Media


[1237]
J. Senoner, S. Schallmoser, B. Kratzwald, S. Feuerriegel and T. Netland.
Explainable AI improves task performance in human–AI collaboration.
Scientific Reports 14.31150 (Dec. 2024). DOI
Abstract

Artificial intelligence (AI) provides considerable opportunities to assist human work. However, one crucial challenge of human-AI collaboration is that many AI algorithms operate in a black-box manner where the way how the AI makes predictions remains opaque. This makes it difficult for humans to validate a prediction made by AI against their own domain knowledge. For this reason, we hypothesize that augmenting humans with explainable AI as a decision aid improves task performance in human-AI collaboration. To test this hypothesis, we analyze the effect of augmenting domain experts with explainable AI in the form of visual heatmaps. We then compare participants that were either supported by (a) black-box AI or (b) explainable AI, where the latter supports them to follow AI predictions when the AI is accurate or overrule the AI when the AI predictions are wrong. We conducted two preregistered experiments with representative, real-world visual inspection tasks from manufacturing and medicine. The first experiment was conducted with factory workers from an electronics factory, who performed N=9,600 assessments of whether electronic products have defects. The second experiment was conducted with radiologists, who performed N=5,650 assessments of chest X-ray images to identify lung lesions. The results of our experiments with domain experts performing real-world tasks show that task performance improves when participants are supported by explainable AI instead of black-box AI. For example, in the manufacturing setting, we find that augmenting participants with explainable AI (as opposed to black-box AI) leads to a five-fold decrease in the median error rate of human decisions, which gives a significant improvement in task performance.

MCML Authors
Link to website

Simon Schallmoser

Artificial Intelligence in Management

Link to Profile Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1236]
A. H. B. Alexander H. Berger, L. Lux, A. Weers, M. Menten, D. Rückert and J. C. Paetzold.
Pitfalls of topology-aware image segmentation.
Preprint (Dec. 2024). arXiv
Abstract

Topological correctness, i.e., the preservation of structural integrity and specific characteristics of shape, is a fundamental requirement for medical imaging tasks, such as neuron or vessel segmentation. Despite the recent surge in topology-aware methods addressing this challenge, their real-world applicability is hindered by flawed benchmarking practices. In this paper, we identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts in ground truth annotations, and inappropriate use of evaluation metrics. Through detailed empirical analysis, we uncover these issues’ profound impact on the evaluation and ranking of segmentation methods. Drawing from our findings, we propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.

MCML Authors
Link to website

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Link to Profile Martin Menten

Martin Menten

Dr.

AI in Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[1235]
A. Baumann, R. Li, M. Klasson, S. Mentu, S. Karthik, Z. Akata, A. Solin and M. Trapp.
Post-hoc Probabilistic Vision-Language Models.
Preprint (Dec. 2024). arXiv
Abstract

Vision-language models (VLMs), such as CLIP and SigLIP, have found remarkable success in classification, retrieval, and generative tasks. For this, VLMs deterministically map images and text descriptions to a joint latent space in which their similarity is assessed using the cosine similarity. However, a deterministic mapping of inputs fails to capture uncertainties over concepts arising from domain shifts when used in downstream tasks. In this work, we propose post-hoc uncertainty estimation in VLMs that does not require additional training. Our method leverages a Bayesian posterior approximation over the last layers in VLMs and analytically quantifies uncertainties over cosine similarities. We demonstrate its effectiveness for uncertainty quantification and support set selection in active learning. Compared to baselines, we obtain improved and well-calibrated predictive uncertainties, interpretable uncertainty estimates, and sample-efficient active learning. Our results show promise for safety-critical applications of large-scale models.

MCML Authors
Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1234]
J. Baumsteiger, L. Celiberti, P. Rinke, M. Todorović and C. Franchini.
Exploring Noncollinear Magnetic Energy Landscapes with Bayesian Optimization.
Preprint (Dec. 2024). arXiv
Abstract

The investigation of magnetic energy landscapes and the search for ground states of magnetic materials using ab initio methods like density functional theory (DFT) is a challenging task. Complex interactions, such as superexchange and spin-orbit coupling, make these calculations computationally expensive and often lead to non-trivial energy landscapes. Consequently, a comprehensive and systematic investigation of large magnetic configuration spaces is often impractical. We approach this problem by utilizing Bayesian Optimization, an active machine learning scheme that has proven to be efficient in modeling unknown functions and finding global minima. Using this approach we can obtain the magnetic contribution to the energy as a function of one or more spin canting angles with relatively small numbers of DFT calculations. To assess the capabilities and the efficiency of the approach we investigate the noncollinear magnetic energy landscapes of selected materials containing 3d, 5d and 5f magnetic ions: Ba3MnNb2O9, LaMn2Si2, β-MnO2, Sr2IrO4, UO2 and Ba2NaOsO6. By comparing our results to previous ab initio studies that followed more conventional approaches, we observe significant improvements in efficiency.

MCML Authors
Link to Profile Patrick Rinke

Patrick Rinke

Prof. Dr.

AI-based Material Science


[1233]
B. Chen, S. Peng, A. Korhonen and B. Plank.
A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI.
Preprint (Dec. 2024). arXiv
Abstract

Disagreement in human labeling is ubiquitous, and can be captured in human judgment distributions (HJDs). Recent research has shown that explanations provide valuable information for understanding human label variation (HLV) and large language models (LLMs) can approximate HJD from a few human-provided label-explanation pairs. However, collecting explanations for every label is still time-consuming. This paper examines whether LLMs can be used to replace humans in generating explanations for approximating HJD. Specifically, we use LLMs as annotators to generate model explanations for a few given human labels. We test ways to obtain and combine these label-explanations with the goal to approximate human judgment distribution. We further compare the resulting human with model-generated explanations, and test automatic and human explanation selection. Our experiments show that LLM explanations are promising for NLI: to estimate HJD, generated explanations yield comparable results to human’s when provided with human labels. Importantly, our results generalize from datasets with human explanations to i) datasets where they are not available and ii) challenging out-of-distribution test sets.

MCML Authors
Link to website

Beiduo Chen

Artificial Intelligence and Computational Linguistics

Link to website

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1232]
S. Dziadzio, V. Udandarao, K. Roth, A. Prabhu, Z. Akata, S. Albanie and M. Bethge.
How to Merge Your Multimodal Models Over Time?.
Preprint (Dec. 2024). arXiv
Abstract

Model merging combines multiple expert models - finetuned from a base foundation model on diverse tasks and domains - into a single, more capable model. However, most existing model merging approaches assume that all experts are available simultaneously. In reality, new tasks and domains emerge progressively over time, requiring strategies to integrate the knowledge of expert models as they become available: a process we call temporal model merging. The temporal dimension introduces unique challenges not addressed in prior work, raising new questions such as: when training for a new task, should the expert model start from the merged past experts or from the original base model? Should we merge all models at each time step? Which merging techniques are best suited for temporal merging? Should different strategies be used to initialize the training and deploy the model? To answer these questions, we propose a unified framework called TIME - Temporal Integration of Model Expertise - which defines temporal model merging across three axes: (1) Initialization Phase, (2) Deployment Phase, and (3) Merging Technique. Using TIME, we study temporal model merging across model sizes, compute budgets, and learning horizons on the FoMo-in-Flux benchmark. Our comprehensive suite of experiments across TIME allows us to uncover key insights for temporal model merging, offering a better understanding of current challenges and best practices for effective temporal model merging.

MCML Authors
Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1231]
M. Fischer, P. Neher, P. J. Schüffler, S. Ziegler, S. Xiao, R. Peretzke, D. Clunie, C. Ulrich, M. Baumgartner, A. Muckenhuber, S. Dias Almeida, M. Götz, J. Kleesiek, M. Nolden, R. Braren and K. Maier-Hein.
Unlocking the Potential of Digital Pathology: Novel Baselines for Compression.
Preprint (Dec. 2024). arXiv
Abstract

Digital pathology offers a groundbreaking opportunity to transform clinical practice in histopathological image analysis, yet faces a significant hurdle: the substantial file sizes of pathological Whole Slide Images (WSI). While current digital pathology solutions rely on lossy JPEG compression to address this issue, lossy compression can introduce color and texture disparities, potentially impacting clinical decision-making. While prior research addresses perceptual image quality and downstream performance independently of each other, we jointly evaluate compression schemes for perceptual and downstream task quality on four different datasets. In addition, we collect an initially uncompressed dataset for an unbiased perceptual evaluation of compression schemes. Our results show that deep learning models fine-tuned for perceptual quality outperform conventional compression schemes like JPEG-XL or WebP for further compression of WSI. However, they exhibit a significant bias towards the compression artifacts present in the training data and struggle to generalize across various compression schemes. We introduce a novel evaluation metric based on feature similarity between original files and compressed files that aligns very well with the actual downstream performance on the compressed WSI. Our metric allows for a general and standardized evaluation of lossy compression schemes and mitigates the requirement to independently assess different downstream tasks. Our study provides novel insights for the assessment of lossy compression schemes for WSI and encourages a unified evaluation of lossy compression schemes to accelerate the clinical uptake of digital pathology.

MCML Authors
Link to Profile Peter Schüffler

Peter Schüffler

Prof. Dr.

Computational Pathology


[1230]
F. Fumagalli, M. Muschalik, E. Hüllermeier, B. Hammer and J. Herbinger.
Unifying Feature-Based Explanations with Functional ANOVA and Cooperative Game Theory.
Preprint (Dec. 2024). arXiv
Abstract

Feature-based explanations, using perturbations or gradients, are a prevalent tool to understand decisions of black box machine learning models. Yet, differences between these methods still remain mostly unknown, which limits their applicability for practitioners. In this work, we introduce a unified framework for local and global feature-based explanations using two well-established concepts: functional ANOVA (fANOVA) from statistics, and the notion of value and interaction from cooperative game theory. We introduce three fANOVA decompositions that determine the influence of feature distributions, and use game-theoretic measures, such as the Shapley value and interactions, to specify the influence of higher-order interactions. Our framework combines these two dimensions to uncover similarities and differences between a wide range of explanation techniques for features and groups of features. We then empirically showcase the usefulness of our framework on synthetic and real-world datasets.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[1229]
F. Fundel, J. Schusterbauer, V. T. Hu and B. Ommer.
Distillation of Diffusion Features for Semantic Correspondence.
Preprint (Dec. 2024). arXiv
Abstract

Semantic correspondence, the task of determining relationships between different parts of images, underpins various applications including 3D reconstruction, image-to-image translation, object tracking, and visual place recognition. Recent studies have begun to explore representations learned in large generative image models for semantic correspondence, demonstrating promising results. Building on this progress, current state-of-the-art methods rely on combining multiple large models, resulting in high computational demands and reduced efficiency. In this work, we address this challenge by proposing a more computationally efficient approach. We propose a novel knowledge distillation technique to overcome the problem of reduced efficiency. We show how to use two large vision foundation models and distill the capabilities of these complementary models into one smaller model that maintains high accuracy at reduced computational cost. Furthermore, we demonstrate that by incorporating 3D data, we are able to further improve performance, without the need for human-annotated correspondences. Overall, our empirical results demonstrate that our distilled model with 3D data augmentation achieves performance superior to current state-of-the-art methods while significantly reducing computational load and enhancing practicality for real-world applications, such as semantic video correspondence. Our code and weights are publicly available on our project page.

MCML Authors
Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1228]
J. Homer, O. Friedrich and D. Grün.
Simulation-based inference has its own Dodelson-Schneider effect (but it knows that it does).
Preprint (Dec. 2024). arXiv
Abstract

Making inferences about physical properties of the Universe requires knowledge of the data likelihood. A Gaussian distribution is commonly assumed for the uncertainties with a covariance matrix estimated from a set of simulations. The noise in such covariance estimates causes two problems: it distorts the width of the parameter contours, and it adds scatter to the location of those contours which is not captured by the widths themselves. For non-Gaussian likelihoods, an approximation may be derived via Simulation-Based Inference (SBI). It is often implicitly assumed that parameter constraints from SBI analyses, which do not use covariance matrices, are not affected by the same problems as parameter estimation with a covariance matrix estimated from simulations. We investigate whether SBI suffers from effects similar to those of covariance estimation in Gaussian likelihoods. We use Neural Posterior and Likelihood Estimation with continuous and masked autoregressive normalizing flows for density estimation. We fit our approximate posterior models to simulations drawn from a Gaussian linear model, so that the SBI result can be compared to the true posterior. We test linear and neural network based compression, demonstrating that neither methods circumvent the issues of covariance estimation. SBI suffers an inflation of posterior variance that is equal or greater than the analytical result in covariance estimation for Gaussian likelihoods for the same number of simulations. The assumption that SBI requires a smaller number of simulations than covariance estimation for a Gaussian likelihood analysis is inaccurate. The limitations of traditional likelihood analysis with simulation-based covariance remain for SBI with a finite simulation budget. Despite these issues, we show that SBI correctly draws the true posterior contour given enough simulations.

MCML Authors
Link to website

Jed Homer

Astrophysics, Cosmology and Artificial Intelligence

Link to Profile Daniel Grün

Daniel Grün

Prof. Dr.

Astrophysics, Cosmology and Artificial Intelligence


[1227]
V. T. Hu and B. Ommer.
[MASK] is All You Need.
Preprint (Dec. 2024). arXiv
Abstract

In generative models, two paradigms have gained attraction in various applications: next-set prediction-based Masked Generative Models and next-noise prediction-based Non-Autoregressive Models, e.g., Diffusion Models. In this work, we propose using discrete-state models to connect them and explore their scalability in the vision domain. First, we conduct a step-by-step analysis in a unified design space across two types of models including timestep-independence, noise schedule, temperature, guidance strength, etc in a scalable manner. Second, we re-cast typical discriminative tasks, e.g., image segmentation, as an unmasking process from [MASK] tokens on a discrete-state model. This enables us to perform various sampling processes, including flexible conditional sampling by only training once to model the joint distribution. All aforementioned explorations lead to our framework named Discrete Interpolants, which enables us to achieve state-of-the-art or competitive performance compared to previous discrete-state based methods in various benchmarks, like ImageNet256, MS COCO, and video dataset FaceForensics. In summary, by leveraging [MASK] in discrete-state models, we can bridge Masked Generative and Non-autoregressive Diffusion models, as well as generative and discriminative tasks.

MCML Authors
Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1226]
S. Kim, R. Xiao, M.-I. Georgescu, S. Alaniz and Z. Akata.
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training.
Preprint (Dec. 2024). arXiv
Abstract

Vision-Language Models (VLMs) trained with contrastive loss have achieved significant advancements in various vision and language tasks. However, the global nature of contrastive loss makes VLMs focus predominantly on foreground objects, neglecting other crucial information in the image, which limits their effectiveness in downstream tasks. To address these challenges, we propose COSMOS: CrOSs-MOdality Self-distillation for vision-language pre-training that integrates a novel text-cropping strategy and cross-attention module into a self-supervised learning framework. We create global and local views of images and texts (i.e., multi-modal augmentations), which are essential for self-distillation in VLMs. We further introduce a cross-attention module, enabling COSMOS to learn comprehensive cross-modal representations optimized via a cross-modality self-distillation loss. COSMOS consistently outperforms previous strong baselines on various zero-shot downstream tasks, including retrieval, classification, and semantic segmentation. Additionally, it surpasses CLIP-based models trained on larger datasets in visual perception and contextual understanding tasks.

MCML Authors
Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1225]
J. Külz, M. Terzer, M. Magri, A. Giusti and M. Althoff.
Holistic Construction Automation with Modular Robots: From High-Level Task Specification to Execution.
Preprint (Dec. 2024). arXiv
Abstract

In situ robotic automation in construction is challenging due to constantly changing environments, a shortage of robotic experts, and a lack of standardized frameworks bridging robotics and construction practices. This work proposes a holistic framework for construction task specification, optimization of robot morphology, and mission execution using a mobile modular reconfigurable robot. Users can specify and monitor the desired robot behavior through a graphical interface. Our framework identifies an optimized robot morphology and enables automatic real-world execution by integrating Building Information Modelling (BIM). By leveraging modular robot components, we ensure seamless and fast adaption to the specific demands of the construction task. Experimental validation demonstrates that our approach robustly enables the autonomous execution of robotic drilling.

MCML Authors
Link to website

Jonathan Külz

Cyber Physical Systems

Link to Profile Matthias Althoff

Matthias Althoff

Prof. Dr.

Cyber Physical Systems


[1224]
W. Li, W. Chen, S. Qian, J. Chen, D. Cremers and H. Li.
DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair.
Preprint (Dec. 2024). arXiv GitHub
Abstract

The problem of symbolic regression (SR) arises in many different applications, such as identifying physical laws or deriving mathematical equations describing the behavior of financial markets from given data. Various methods exist to address the problem of SR, often based on genetic programming. However, these methods are usually complicated and involve various hyperparameters. In this paper, we present our new approach ParFam that utilizes parametric families of suitable symbolic functions to translate the discrete symbolic regression problem into a continuous one, resulting in a more straightforward setup compared to current state-of-the-art methods. In combination with a global optimizer, this approach results in a highly effective method to tackle the problem of SR. We theoretically analyze the expressivity of ParFam and demonstrate its performance with extensive numerical experiments based on the common SR benchmark suit SRBench, showing that we achieve state-of-the-art results. Moreover, we present an extension incorporating a pre-trained transformer network DL-ParFam to guide ParFam, accelerating the optimization process by up to two magnitudes.

MCML Authors
Link to website

Weirong Chen

Computer Vision & Artificial Intelligence

Link to website

Shenhan Qian

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Link to website

Haoang Li

Dr.

* Former member


[1223]
Y. Li, M. Milling, L. Specia and B. W. Schuller.
From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview.
Preprint (Dec. 2024). arXiv
Abstract

As Artificial Intelligence (AI) technologies continue to evolve, their use in generating realistic, contextually appropriate content has expanded into various domains. Music, an art form and medium for entertainment, deeply rooted into human culture, is seeing an increased involvement of AI into its production. However, despite the effective application of AI music generation (AIGM) tools, the unregulated use of them raises concerns about potential negative impacts on the music industry, copyright and artistic integrity, underscoring the importance of effective AIGM detection. This paper provides an overview of existing AIGM detection methods. To lay a foundation to the general workings and challenges of AIGM detection, we first review general principles of AIGM, including recent advancements in deepfake audios, as well as multimodal detection techniques. We further propose a potential pathway for leveraging foundation models from audio deepfake detection to AIGM detection. Additionally, we discuss implications of these tools and propose directions for future research to address ongoing challenges in the field.

MCML Authors
Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1222]
S. Liang, S. Wang, K. Li, M. Niemeyer, S. Gasperini, N. Navab and F. Tombari.
SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians.
Preprint (Dec. 2024). arXiv
Abstract

3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While the vanilla Gaussian Splatting representation is mainly designed for view synthesis, more recent works investigated how to extend it with scene understanding and language features. However, existing methods lack a detailed comprehension of scenes, limiting their ability to segment and interpret complex structures. To this end, We introduce SuperGSeg, a novel approach that fosters cohesive, context-aware scene representation by disentangling segmentation and language field distillation. SuperGSeg first employs neural Gaussians to learn instance and hierarchical segmentation features from multi-view images with the aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse set of what we call Super-Gaussians. Super-Gaussians facilitate the distillation of 2D language features into 3D space. Through Super-Gaussians, our method enables high-dimensional language feature rendering without extreme increases in GPU memory. Extensive experiments demonstrate that SuperGSeg outperforms prior works on both open-vocabulary object localization and semantic segmentation tasks.

MCML Authors
Link to website

Kunyi Li

Computer Aided Medical Procedures & Augmented Reality

Link to website

Stefano Gasperini

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[1221]
B. Ma, B. Yoztyurk, A.-C. Haensch, X. Wang, M. Herklotz, F. Kreuter, B. Plank and M. Assenmacher.
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study.
Preprint (Dec. 2024). arXiv
Abstract

In recent research, large language models (LLMs) have been increasingly used to investigate public opinions. This study investigates the algorithmic fidelity of LLMs, i.e., the ability to replicate the socio-cultural context and nuanced opinions of human participants. Using open-ended survey data from the German Longitudinal Election Studies (GLES), we prompt different LLMs to generate synthetic public opinions reflective of German subpopulations by incorporating demographic features into the persona prompts. Our results show that Llama performs better than other LLMs at representing subpopulations, particularly when there is lower opinion diversity within those groups. Our findings further reveal that the LLM performs better for supporters of left-leaning parties like The Greens and The Left compared to other parties, and matches the least with the right-party AfD. Additionally, the inclusion or exclusion of specific variables in the prompts can significantly impact the models’ predictions. These findings underscore the importance of aligning LLMs to more effectively model diverse public opinions while minimizing political biases and enhancing robustness in representativeness.

MCML Authors
Link to website

Bolei Ma

Social Data Science and AI Lab

Link to website

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Link to Profile Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1220]
Y. Mansour and R. Heckel.
Measuring Bias of Web-filtered Text Datasets and Bias Propagation Through Training.
Preprint (Dec. 2024). arXiv
Abstract

We investigate biases in pretraining datasets for large language models (LLMs) through dataset classification experiments. Building on prior work demonstrating the existence of biases in popular computer vision datasets, we analyze popular open-source pretraining datasets for LLMs derived from CommonCrawl including C4, RefinedWeb, DolmaCC, RedPajama-V2, FineWeb, and DCLM-Baseline. Despite those datasets being obtained with similar filtering and deduplication steps, neural networks can classify surprisingly well which dataset a single text sequence belongs to, significantly better than a human can. This indicates that popular pretraining datasets have their own unique biases or fingerprints. Those biases remain even when the text is rewritten with LLMs. Moreover, these biases propagate through training: Random sequences generated by models trained on those datasets can be classified well by a classifier trained on the original datasets.

MCML Authors
Link to Profile Reinhard Heckel

Reinhard Heckel

Prof. Dr.

Machine Learning


[1219]
P. Pisal, O. Krejci and P. Rinke.
Machine-learning Accelerated Descriptor Design for Catalyst Discovery: A CO2 to Methanol Conversion Case Study.
Preprint (Dec. 2024). arXiv
Abstract

Transforming CO2 into methanol represents a crucial step towards closing the carbon cycle, with thermoreduction technology nearing industrial application. However, obtaining high methanol yields and ensuring the stability of heterocatalysts remain significant challenges. Herein, we present a sophisticated computational framework to accelerate the discovery of novel thermal heterogeneous catalysts, using machine-learned force fields. We propose a new catalytic descriptor, termed adsorption energy distribution, that aggregates the binding energies for different catalyst facets, binding sites, and adsorbates. The descriptor is versatile and can easily be adjusted to a specific reaction through careful choice of the key-step reactants and reaction intermediates. By applying unsupervised machine learning and statistical analysis to a dataset comprising nearly 160 metallic alloys, we offer a powerful tool for catalyst discovery. Finally, we propose new promising candidate materials such as ZnRh and ZnPt3, which to our knowledge, have not yet been tested, and discuss their possible advantage in terms of stability.

MCML Authors
Link to Profile Patrick Rinke

Patrick Rinke

Prof. Dr.

AI-based Material Science


[1218]
A. Reithmeir, V. Spieker, V. Sideri-Lampretsa, D. Rückert, J. A. Schnabel and V. A. Zimmer.
From Model Based to Learned Regularization in Medical Image Registration: A Comprehensive Review.
Preprint (Dec. 2024). arXiv
Abstract

Image registration is fundamental in medical imaging applications, such as disease progression analysis or radiation therapy planning. The primary objective of image registration is to precisely capture the deformation between two or more images, typically achieved by minimizing an optimization problem. Due to its inherent ill-posedness, regularization is a key component in driving the solution toward anatomically meaningful deformations. A wide range of regularization methods has been proposed for both conventional and deep learning-based registration. However, the appropriate application of regularization techniques often depends on the specific registration problem, and no one-fits-all method exists. Despite its importance, regularization is often overlooked or addressed with default approaches, assuming existing methods are sufficient. A comprehensive and structured review remains missing. This review addresses this gap by introducing a novel taxonomy that systematically categorizes the diverse range of proposed regularization methods. It highlights the emerging field of learned regularization, which leverages data-driven techniques to automatically derive deformation properties from the data. Moreover, this review examines the transfer of regularization methods from conventional to learning-based registration, identifies open challenges, and outlines future research directions. By emphasizing the critical role of regularization in image registration, we hope to inspire the research community to reconsider regularization strategies in modern registration algorithms and to explore this rapidly evolving field further.

MCML Authors
Link to website

Anna Reithmeir

Computational Imaging and AI in Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[1217]
M. Sabanayagam, L. Gosch, S. Günnemann and D. Ghoshdastidar.
Exact Certification of (Graph) Neural Networks Against Label Poisoning.
Preprint (Dec. 2024). arXiv
Abstract

Machine learning models are highly vulnerable to label flipping, i.e., the adversarial modification (poisoning) of training labels to compromise performance. Thus, deriving robustness certificates is important to guarantee that test predictions remain unaffected and to understand worst-case robustness behavior. However, for Graph Neural Networks (GNNs), the problem of certifying label flipping has so far been unsolved. We change this by introducing an exact certification method, deriving both sample-wise and collective certificates. Our method leverages the Neural Tangent Kernel (NTK) to capture the training dynamics of wide networks enabling us to reformulate the bilevel optimization problem representing label flipping into a Mixed-Integer Linear Program (MILP). We apply our method to certify a broad range of GNN architectures in node classification tasks. Thereby, concerning the worst-case robustness to label flipping: (i) we establish hierarchies of GNNs on different benchmark graphs; (ii) quantify the effect of architectural choices such as activations, depth and skip-connections; and surprisingly, (iii) uncover a novel phenomenon of the robustness plateauing for intermediate perturbation budgets across all investigated datasets and architectures. While we focus on GNNs, our certificates are applicable to sufficiently wide NNs in general through their NTK. Thus, our work presents the first exact certificate to a poisoning attack ever derived for neural networks, which could be of independent interest.

MCML Authors
Link to website

Lukas Gosch

Data Analytics & Machine Learning

Link to Profile Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[1216]
C. Sauer, A.-L. Boulesteix, L. Hanßum, F. Hodiamont, C. Bausewein and T. Ullmann.
Beyond algorithm hyperparameters: on preprocessing hyperparameters and associated pitfalls in machine learning applications.
Preprint (Dec. 2024). arXiv
Abstract

Adequately generating and evaluating prediction models based on supervised machine learning (ML) is often challenging, especially for less experienced users in applied research areas. Special attention is required in settings where the model generation process involves hyperparameter tuning, i.e. data-driven optimization of different types of hyperparameters to improve the predictive performance of the resulting model. Discussions about tuning typically focus on the hyperparameters of the ML algorithm (e.g., the minimum number of observations in each terminal node for a tree-based algorithm). In this context, it is often neglected that hyperparameters also exist for the preprocessing steps that are applied to the data before it is provided to the algorithm (e.g., how to handle missing feature values in the data). As a consequence, users experimenting with different preprocessing options to improve model performance may be unaware that this constitutes a form of hyperparameter tuning - albeit informal and unsystematic - and thus may fail to report or account for this optimization. To illuminate this issue, this paper reviews and empirically illustrates different procedures for generating and evaluating prediction models, explicitly addressing the different ways algorithm and preprocessing hyperparameters are typically handled by applied ML users. By highlighting potential pitfalls, especially those that may lead to exaggerated performance claims, this review aims to further improve the quality of predictive modeling in ML applications.

MCML Authors
Link to website

Christina Sauer (née Nießl)

Biometry in Molecular Medicine

Link to Profile Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Theresa Ullmann

Theresa Ullmann

Dr.

Biometry in Molecular Medicine


[1215]
N. Stracke, S. A. Baumann, K. Bauer, F. Fundel and B. Ommer.
CleanDIFT: Diffusion Features without Noise.
Preprint (Dec. 2024). arXiv
Abstract

Internal features from large-scale pre-trained diffusion models have recently been established as powerful semantic descriptors for a wide range of downstream tasks. Works that use these features generally need to add noise to images before passing them through the model to obtain the semantic features, as the models do not offer the most useful features when given images with little to no noise. We show that this noise has a critical impact on the usefulness of these features that cannot be remedied by ensembling with different random noises. We address this issue by introducing a lightweight, unsupervised fine-tuning method that enables diffusion backbones to provide high-quality, noise-free semantic features. We show that these features readily outperform previous diffusion features by a wide margin in a wide variety of extraction setups and downstream tasks, offering better performance than even ensemble-based methods at a fraction of the cost.

MCML Authors
Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1214]
Q. Sun, Y. Li, E. Alturki, S. M. K. Murthy and B. W. Schuller.
Towards Friendly AI: A Comprehensive Review and New Perspectives on Human-AI Alignment.
Preprint (Dec. 2024). arXiv
Abstract

As Artificial Intelligence (AI) continues to advance rapidly, Friendly AI (FAI) has been proposed to advocate for more equitable and fair development of AI. Despite its importance, there is a lack of comprehensive reviews examining FAI from an ethical perspective, as well as limited discussion on its potential applications and future directions. This paper addresses these gaps by providing a thorough review of FAI, focusing on theoretical perspectives both for and against its development, and presenting a formal definition in a clear and accessible format. Key applications are discussed from the perspectives of eXplainable AI (XAI), privacy, fairness and affective computing (AC). Additionally, the paper identifies challenges in current technological advancements and explores future research avenues. The findings emphasise the significance of developing FAI and advocate for its continued advancement to ensure ethical and beneficial AI development.

MCML Authors
Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1213]
A. Testoni, B. Plank and R. Fernández.
RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs.
Preprint (Dec. 2024). arXiv
Abstract

Ambiguity resolution is key to effective communication. While humans effortlessly address ambiguity through conversational grounding strategies, the extent to which current language models can emulate these strategies remains unclear. In this work, we examine referential ambiguity in image-based question answering by introducing RACQUET, a carefully curated dataset targeting distinct aspects of ambiguity. Through a series of evaluations, we reveal significant limitations and problems of overconfidence of state-of-the-art large multimodal language models in addressing ambiguity in their responses. The overconfidence issue becomes particularly relevant for RACQUET-BIAS, a subset designed to analyze a critical yet underexplored problem: failing to address ambiguity leads to stereotypical, socially biased responses. Our results underscore the urgency of equipping models with robust strategies to deal with uncertainty without resorting to undesirable stereotypes.

MCML Authors
Link to Profile Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1212]
J. Wang, Z. Qin, Y. Zhang, V. Hu, B. Ommer, R. Briq and S. Kesselheim.
Scaling Image Tokenizers with Grouped Spherical Quantization.
Preprint (Dec. 2024). arXiv
Abstract

Vision tokenizers have gained a lot of attraction due to their scalability and compactness; previous works depend on old-school GAN-based hyperparameters, biased comparisons, and a lack of comprehensive analysis of the scaling behaviours. To tackle those issues, we introduce Grouped Spherical Quantization (GSQ), featuring spherical codebook initialization and lookup regularization to constrain codebook latent to a spherical surface. Our empirical analysis of image tokenizer training strategies demonstrates that GSQ-GAN achieves superior reconstruction quality over state-of-the-art methods with fewer training iterations, providing a solid foundation for scaling studies. Building on this, we systematically examine the scaling behaviours of GSQ, specifically in latent dimensionality, codebook size, and compression ratios, and their impact on model performance. Our findings reveal distinct behaviours at high and low spatial compression levels, underscoring challenges in representing high-dimensional latent spaces. We show that GSQ can restructure high-dimensional latent into compact, low-dimensional spaces, thus enabling efficient scaling with improved quality. As a result, GSQ-GAN achieves a 16x down-sampling with a reconstruction FID (rFID) of 0.50.

MCML Authors
Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1211]
Y. Wang, Q. Song, D. Wasif, M. Shahzad, C. Koller, J. Bamber and X. Zhu.
How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning.
Preprint (Dec. 2024). arXiv GitHub
Abstract

Uncertainty quantification (UQ) is essential for assessing the reliability of Earth observation (EO) products. However, the extensive use of machine learning models in EO introduces an additional layer of complexity, as those models themselves are inherently uncertain. While various UQ methods do exist for machine learning models, their performance on EO datasets remains largely unevaluated. A key challenge in the community is the absence of the ground truth for uncertainty, i.e. how certain the uncertainty estimates are, apart from the labels for the image/signal. This article fills this gap by introducing three benchmark datasets specifically designed for UQ in EO machine learning models. These datasets address three common problem types in EO: regression, image segmentation, and scene classification. They enable a transparent comparison of different UQ methods for EO machine learning models. We describe the creation and characteristics of each dataset, including data sources, preprocessing steps, and label generation, with a particular focus on calculating the reference uncertainty. We also showcase baseline performance of several machine learning models on each dataset, highlighting the utility of these benchmarks for model development and comparison. Overall, this article offers a valuable resource for researchers and practitioners working in artificial intelligence for EO, promoting a more accurate and reliable quality measure of the outputs of machine learning models.

MCML Authors
Link to Profile Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1210]
J. Weidner, M. Balcerak, I. Ezhov, A. Datchev, L. Lux, L. Zimmer, D. Rückert, B. Menze and B. Wiestler.
Spatial Brain Tumor Concentration Estimation for Individualized Radiotherapy Planning.
Preprint (Dec. 2024). arXiv
Abstract

Biophysical modeling of brain tumors has emerged as a promising strategy for personalizing radiotherapy planning by estimating the otherwise hidden distribution of tumor cells within the brain. However, many existing state-of-the-art methods are computationally intensive, limiting their widespread translation into clinical practice. In this work, we propose an efficient and direct method that utilizes soft physical constraints to estimate the tumor cell concentration from preoperative MRI of brain tumor patients. Our approach optimizes a 3D tumor concentration field by simultaneously minimizing the difference between the observed MRI and a physically informed loss function. Compared to existing state-of-the-art techniques, our method significantly improves predicting tumor recurrence on two public datasets with a total of 192 patients while maintaining a clinically viable runtime of under one minute - a substantial reduction from the 30 minutes required by the current best approach. Furthermore, we showcase the generalizability of our framework by incorporating additional imaging information and physical constraints, highlighting its potential to translate to various medical diffusion phenomena with imperfect data.

MCML Authors
Link to website

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy


[1209]
Y. Xia, Z. Li, Y.-J. Li, L. Shi, H. Cao, J. F. H. João F. Henriques and D. Cremers.
UniLoc: Towards Universal Place Recognition Using Any Single Modality.
Preprint (Dec. 2024). arXiv GitHub
Abstract

To date, most place recognition methods focus on single-modality retrieval. While they perform well in specific environments, cross-modal methods offer greater flexibility by allowing seamless switching between map and query sources. It also promises to reduce computation requirements by having a unified model, and achieving greater sample efficiency by sharing parameters. In this work, we develop a universal solution to place recognition, UniLoc, that works with any single query modality (natural language, image, or point cloud). UniLoc leverages recent advances in large-scale contrastive learning, and learns by matching hierarchically at two levels: instance-level matching and scene-level matching. Specifically, we propose a novel Self-Attention based Pooling (SAP) module to evaluate the importance of instance descriptors when aggregated into a place-level descriptor. Experiments on the KITTI-360 dataset demonstrate the benefits of cross-modality for place recognition, achieving superior performance in cross-modal settings and competitive results also for uni-modal scenarios.

MCML Authors
Link to website

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1208]
Y. Xia, Y. Lu, R. Song, O. Dhaouadi, J. F. Henriques and D. Cremers.
TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes.
Preprint (Dec. 2024). arXiv GitHub
Abstract

We tackle the problem of localizing the traffic surveillance cameras in cooperative perception. To overcome the lack of large-scale real-world intersection datasets, we introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla. Moreover, we introduce a novel neural network, TrafficLoc, localizing traffic cameras within a 3D reference map. TrafficLoc employs a coarse-to-fine matching pipeline. For image-point cloud feature fusion, we propose a novel Geometry-guided Attention Loss to address cross-modal viewpoint inconsistencies. During coarse matching, we propose an Inter-Intra Contrastive Learning to achieve precise alignment while preserving distinctiveness among local intra-features within image patch-point group pairs. Besides, we introduce Dense Training Alignment with a soft-argmax operator to consider additional features when regressing the final position. Extensive experiments show that our TrafficLoc improves the localization accuracy over the state-of-the-art Image-to-point cloud registration methods by a large margin (up to 86%) on Carla Intersection and generalizes well to real-world data. TrafficLoc also achieves new SOTA performance on KITTI and NuScenes datasets, demonstrating strong localization ability across both in-vehicle and traffic cameras.

MCML Authors
Link to website

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1207]
R. Xiao, S. Kim, M.-I. Georgescu, Z. Akata and S. Alaniz.
FLAIR: VLM with Fine-grained Language-informed Image Representations.
Preprint (Dec. 2024). arXiv GitHub
Abstract

CLIP has shown impressive results in aligning images and texts at scale. However, its ability to capture detailed visual features remains limited because CLIP matches images and texts at a global level. To address this issue, we propose FLAIR, Fine-grained Language-informed Image Representations, an approach that utilizes long and detailed image descriptions to learn localized image embeddings. By sampling diverse sub-captions that describe fine-grained details about an image, we train our vision-language model to produce not only global embeddings but also text-specific image representations. Our model introduces text-conditioned attention pooling on top of local image tokens to produce fine-grained image representations that excel at retrieving detailed image content. We achieve state-of-the-art performance on both, existing multimodal retrieval benchmarks, as well as, our newly introduced fine-grained retrieval task which evaluates vision-language models’ ability to retrieve partial image content. Furthermore, our experiments demonstrate the effectiveness of FLAIR trained on 30M image-text pairs in capturing fine-grained visual information, including zero-shot semantic segmentation, outperforming models trained on billions of pairs.

MCML Authors
Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1206]
X. Xue, G. Wei, H. Chen, H. Zhang, F. Lin, C. Shen and X. Zhu.
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation.
Preprint (Dec. 2024). arXiv
Abstract

The rapid evolution of Vision Language Models (VLMs) has catalyzed significant advancements in artificial intelligence, expanding research across various disciplines, including Earth Observation (EO). While VLMs have enhanced image understanding and data processing within EO, their applications have predominantly focused on image content description. This limited focus overlooks their potential in geographic and scientific regression tasks, which are essential for diverse EO applications. To bridge this gap, this paper introduces a novel benchmark dataset, called textbf{REO-Instruct} to unify regression and generation tasks specifically for the EO domain. Comprising 1.6 million multimodal EO imagery and language pairs, this dataset is designed to support both biomass regression and image content interpretation tasks. Leveraging this dataset, we develop REO-VLM, a groundbreaking model that seamlessly integrates regression capabilities with traditional generative functions. By utilizing language-driven reasoning to incorporate scientific domain knowledge, REO-VLM goes beyond solely relying on EO imagery, enabling comprehensive interpretation of complex scientific attributes from EO data. This approach establishes new performance benchmarks and significantly enhances the capabilities of environmental monitoring and resource management.

MCML Authors
Link to Profile Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1205]
H. Ye, A. Wisiorek, A. Maronikolakis, Ö. Alaçam and H. Schütze.
A Federated Approach to Few-Shot Hate Speech Detection for Marginalized Communities.
Preprint (Dec. 2024). arXiv
Abstract

Hate speech online remains an understudied issue for marginalized communities, and has seen rising relevance, especially in the Global South, which includes developing societies with increasing internet penetration. In this paper, we aim to provide marginalized communities living in societies where the dominant language is low-resource with a privacy-preserving tool to protect themselves from hate speech on the internet by filtering offensive content in their native languages. Our contribution in this paper is twofold: 1) we release REACT (REsponsive hate speech datasets Across ConTexts), a collection of high-quality, culture-specific hate speech detection datasets comprising seven distinct target groups in eight low-resource languages, curated by experienced data collectors; 2) we propose a solution to few-shot hate speech detection utilizing federated learning (FL), a privacy-preserving and collaborative learning approach, to continuously improve a central model that exhibits robustness when tackling different target groups and languages. By keeping the training local to the users’ devices, we ensure the privacy of the users’ data while benefitting from the efficiency of federated learning. Furthermore, we personalize client models to target-specific training data and evaluate their performance. Our results indicate the effectiveness of FL across different target groups, whereas the benefits of personalization on few-shot learning are not clear.

MCML Authors
Link to website

Haotian Ye

Statistical NLP and Deep Learning

Link to website

Axel Wisiorek

Dr.

Statistical NLP and Deep Learning

Link to website

Antonis Maronikolakis

Statistical NLP and Deep Learning

Link to Profile Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1204]
Y. Yeganeh, I. Charisiadis, M. Hasny, M. Hartenberger, B. Ommer, N. Navab, A. Farshad and E. Adeli.
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis.
Preprint (Dec. 2024). arXiv
Abstract

Scaling by training on large datasets has been shown to enhance the quality and fidelity of image generation and manipulation with diffusion models; however, such large datasets are not always accessible in medical imaging due to cost and privacy issues, which contradicts one of the main applications of such models to produce synthetic samples where real data is scarce. Also, finetuning on pre-trained general models has been a challenge due to the distribution shift between the medical domain and the pre-trained models. Here, we propose Latent Drift (LD) for diffusion models that can be adopted for any fine-tuning method to mitigate the issues faced by the distribution shift or employed in inference time as a condition. Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation, which is crucial to investigate how parameters such as gender, age, and adding or removing diseases in a patient would alter the medical images. We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation. Our results demonstrate significant performance gains in various scenarios when combined with different fine-tuning schemes. The source code of this work will be publicly released upon its acceptance.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[1203]
Y. Yeganeh, R. Xiao, G. Guvercin, N. Navab and A. Farshad.
Conformable Convolution for Topologically Aware Learning of Complex Anatomical Structures.
Preprint (Dec. 2024). arXiv
Abstract

While conventional computer vision emphasizes pixel-level and feature-based objectives, medical image analysis of intricate biological structures necessitates explicit representation of their complex topological properties. Despite their successes, deep learning models often struggle to accurately capture the connectivity and continuity of fine, sometimes pixel-thin, yet critical structures due to their reliance on implicit learning from data. Such shortcomings can significantly impact the reliability of analysis results and hinder clinical decision-making. To address this challenge, we introduce Conformable Convolution, a novel convolutional layer designed to explicitly enforce topological consistency. Conformable Convolution learns adaptive kernel offsets that preferentially focus on regions of high topological significance within an image. This prioritization is guided by our proposed Topological Posterior Generator (TPG) module, which leverages persistent homology. The TPG module identifies key topological features and guides the convolutional layers by applying persistent homology to feature maps transformed into cubical complexes. Our proposed modules are architecture-agnostic, enabling them to be integrated seamlessly into various architectures. We showcase the effectiveness of our framework in the segmentation task, where preserving the interconnectedness of structures is critical. Experimental results on three diverse datasets demonstrate that our framework effectively preserves the topology in the segmentation downstream task, both quantitatively and qualitatively.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[1202]
H. Zeng, M. Gao and D. Cremers.
CoE: Deep Coupled Embedding for Non-Rigid Point Cloud Correspondences.
Preprint (Dec. 2024). arXiv
Abstract

The interest in matching non-rigidly deformed shapes represented as raw point clouds is rising due to the proliferation of low-cost 3D sensors. Yet, the task is challenging since point clouds are irregular and there is a lack of intrinsic shape information. We propose to tackle these challenges by learning a new shape representation – a per-point high dimensional embedding, in an embedding space where semantically similar points share similar embeddings. The learned embedding has multiple beneficial properties: it is aware of the underlying shape geometry and is robust to shape deformations and various shape artefacts, such as noise and partiality. Consequently, this embedding can be directly employed to retrieve high-quality dense correspondences through a simple nearest neighbor search in the embedding space. Extensive experiments demonstrate new state-of-the-art results and robustness in numerous challenging non-rigid shape matching benchmarks and show its great potential in other shape analysis tasks, such as segmentation.

MCML Authors
Link to website

Maolin Gao

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1201]
A. Kathan, S. Amiriparian, L. Christ, S. Eulitz and B. W. Schuller.
Automatic Speech-Based Charisma Recognition and the Impact of Integrating Auxiliary Characteristics.
TELEPRESENCE 2024 - IEEE Conference on Telepresence. Pasadena, CA, USA, Nov 16-17, 2024. DOI
Abstract

Automatic recognition of speaker’s states and traits is crucial to facilitate a more naturalistic human-AI interaction – a key focus in human-computer interaction to enhance user experience. One particularly important trait in daily life is charisma. To date, its definition is still controversial. However, it seems that there are characteristics in speech that the majority perceives as charismatic. To this end, we address the novel speech-based task of charisma recognition in a three-fold approach. First, we predict charismatic speech using both interpretable acoustic features and embeddings of two audio Transformers. Afterwards, we make use of auxiliary labels that are highly correlated with charisma, including enthusiastic, likeable, attractive, warm, and leader-like, to check their impact on charisma recognition. Finally, we personalise the best model, taking individual speech characteristics into account. In our experiments, we demonstrate that the charisma prediction model benefits from integrating auxiliary characteristics as well as from the personalised approach, resulting in a best Pearson’s correlation coefficient of 0.4304.

MCML Authors
Link to website

Alexander Kathan

Health Informatics