Publications by our Members

Health Informatics

Anika Spiesberger

Health Informatics

Iosif Tsangko

Health Informatics

Xin Jing

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[2063]

S. Bamberger, R. Heckel and F. Krahmer.
Approximating Positive Homogeneous Functions with Scale Invariant Neural Networks.
Journal of Approximation Theory 311.106177 (Nov. 2025). DOI

Abstract

We investigate the approximation of positive homogeneous functions, i.e., functions satisfying for all , with neural networks. Extending previous work, we establish new results explaining under which conditions such functions can be approximated with neural networks. As a key application for this, we analyze to what extent it is possible to solve linear inverse problems with networks. Due to the scaling invariance arising from the linearity, an optimal reconstruction function for such a problem is positive homogeneous. In a network, this condition translates to considering networks without bias terms. For the recovery of sparse vectors from few linear measurements, our results imply that networks with two hidden layers allow approximate recovery with arbitrary precision and arbitrary sparsity level in a stable way. In contrast, we also show that with only one hidden layer such networks cannot even recover 1-sparse vectors, not even approximately, and regardless of the width of the network. These findings even apply to a wider class of recovery problems including low-rank matrix recovery and phase retrieval. Our results also shed some light on the seeming contradiction between previous works showing that neural networks for inverse problems typically have very large Lipschitz constants, but still perform very well also for adversarial noise. Namely, the error bounds in our expressivity results include a combination of a small constant term and a term that is linear in the noise level, indicating that robustness issues may occur only for very small noise levels.

MCML Authors

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[2062]

Z. Jonassen, K. Lawrence, B. M. Wiesenfeld, S. Feuerriegel and D. Mann.
A qualitative analysis of remote patient monitoring: how a paradox mindset can support balancing emotional tensions in the design of healthcare technologies.
CSCW 2025 - 28th ACM SIGCHI Conference on Computer-Supported Cooperative Work and Social Computing. Bergen, Norway, Oct 18-22, 2025. To be published. Preprint available. DOI

Abstract

Remote patient monitoring (RPM) is the use of digital technologies to improve patient care at a distance. However, current RPM solutions are often biased toward tech-savvy patients. To foster health equity, researchers have studied how to address the socio-economic and cognitive needs of diverse patient groups, but their emotional needs have remained largely neglected. We perform the first qualitative study to explore the emotional needs of diverse patients around RPM. Specifically, we conduct a thematic analysis of 18 interviews and 4 focus groups at a large US healthcare organization. We identify emotional needs that lead to four emotional tensions within and across stakeholder groups when applying an equity focus to the design and implementation of RPM technologies. The four emotional tensions are making diverse patients feel: (i) heard vs. exploited; (ii) seen vs. deprioritized for efficiency; (iii) empowered vs. anxious; and (iv) cared for vs. detached from care. To manage these emotional tensions across stakeholders, we develop design recommendations informed by a paradox mindset (i.e., ‘both-and’ rather than ‘and-or’ strategies).

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[2061]

P. Jahn, W. Durani, C. Leiber, A. Beer and T. Seidl.
Going Offline: An Evaluation of the Offline Phase in Stream Clustering.
ECML-PKDD 2025 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Porto, Portugal, Sep 15-19, 2025. To be published. GitHub

Abstract

Data streams are a challenging and ever more relevant setting for clustering methods as more data arrives faster and faster. Stream clustering strategies either determine the clusters in an online manner directly as the instances appear, or they employ an offline phase where the online summarization structures are processed to obtain a clustering result. A recent analysis found that offline clustering may often be unnecessary or even counterproductive. The methods used in the offline phase are usually fixed for each stream clustering approach and typically stem from only a handful of clustering techniques. In this paper, we perform a broad experimental analysis specifically targeting the offline phase of stream clustering. We analyze several ways of extracting information from the summarization structures, including a novel strategy
based on data generation. Ultimately, we showcase that an offline phase is an impactful design choice for stream clustering. We also find that the chosen offline method significantly impacts the clustering performance, with the clustering quality improving drastically for some settings.

MCML Authors

Philipp Jahn

Database Systems and Data Mining

Collin Leiber

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[2060]

E. Özeren, A. Ulbrich, S. Filimon, D. Rügamer and A. Bender.
Enhancing Traffic Accident Classifications: Application of NLP Methods for City Safety.
ECML-PKDD 2025 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Porto, Portugal, Sep 15-19, 2025. To be published. Preprint available. arXiv

Abstract

A comprehensive understanding of traffic accidents is essential for improving city safety and informing policy decisions. In this study, we analyze traffic incidents in Munich to identify patterns and characteristics that distinguish different types of accidents. The dataset consists of both structured tabular features, such as location, time, and weather conditions, as well as unstructured free-text descriptions detailing the circumstances of each accident. Each incident is categorized into one of seven predefined classes. To assess the reliability of these labels, we apply NLP methods, including topic modeling and few-shot learning, which reveal inconsistencies in the labeling process. These findings highlight potential ambiguities in accident classification and motivate a refined predictive approach. Building on these insights, we develop a classification model that achieves high accuracy in assigning accidents to their respective categories. Our results demonstrate that textual descriptions contain the most informative features for classification, while the inclusion of tabular data provides only marginal improvements. These findings emphasize the critical role of free-text data in accident analysis and highlight the potential of transformer-based models in improving classification reliability.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[2059]

L. Schneider, B. Bischl and M. Feurer.
Overtuning in Hyperparameter Optimization.
AutoML 2025 - Methods Track - Methods Track at the International Conference on Automated Machine Learning. New York City, NY, USA, Sep 08-11, 2025. To be published. URL

Abstract

Hyperparameter optimization (HPO) aims to identify an optimal hyperparameter configuration (HPC) such that the resulting model generalizes well to unseen data. Since directly optimizing the expected generalization error is impossible, resampling techniques like holdout validation or cross-validation are used as proxy measures in HPO. However, this implicitly assumes that the HPC minimizing validation error will also yield the best true generalization performance. Given that our inner validation error estimate is inherently stochastic and depends on the resampling, we study: Can excessive optimization of the validation error lead to a similarly detrimental effect as excessive optimization of the empirical risk of an ML model? This phenomenon, which we refer to as overtuning, represents a form of overfitting at the HPO level. Despite its potential impact, overtuning has received limited attention in the HPO and automated machine learning (AutoML) literature. We first formally define overtuning and distinguish it from related concepts such as meta-overfitting. We then reanalyze large-scale HPO benchmark data, assessing how frequently overtuning occurs and its practical relevance. Our findings suggest that overtuning is more common than expected, although often mild. However, in 10% of cases, severe overtuning results in selecting an HPC whose generalization performance is worse than the default HPC. We further examine how factors such as the chosen performance metric, resampling method, dataset size, learning algorithm, and optimization strategy influence overtuning and discuss potential mitigation strategies. Our results highlight the need to raise awareness of overtuning, particularly in the small-data regime, indicating that further mitigation strategies should be studied.

MCML Authors

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

[2058]

Z. Li, D. Muhtar, F. Gu, X. Zhang, P. Xiao, G. He and X. Zhu.
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation.
ISPRS Journal of Photogrammetry and Remote Sensing 227 (Sep. 2025). DOI GitHub

Abstract

Automatically and rapidly understanding Earth’s surface is fundamental to our grasp of the living environment and informed decision-making. This underscores the need for a unified system with comprehensive capabilities in analyzing Earth’s surface to address a wide range of human needs. The emergence of multimodal large language models (MLLMs) has great potential in boosting the efficiency and convenience of intelligent Earth observation. These models can engage in human-like conversations, serve as unified platforms for understanding images, follow diverse instructions, and provide insightful feedbacks. In this study, we introduce LHRS-Bot-Nova, an MLLM specialized in understanding remote sensing (RS) images, designed to expertly perform a wide range of RS understanding tasks aligned with human instructions. LHRS-Bot-Nova features an enhanced vision encoder and a novel bridge layer, enabling efficient visual compression and better language-vision alignment. To further enhance RS-oriented vision-language alignment, we propose a large-scale RS image-caption dataset, generated through feature-guided image recaptioning. Additionally, we introduce an instruction dataset specifically designed to improve spatial recognition abilities. Extensive experiments demonstrate superior performance of LHRS-Bot-Nova across various RS image understanding tasks. We also evaluate different MLLM performances in complex RS perception and instruction following using a complicated multi-choice question evaluation benchmark, providing a reliable guide for future model selection and improvement.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[2057]

S. Rauch, C. M. M. Frey, A. Maldonado and T. Seidl.
BEST: Bilaterally Expanding Subtrace Tree for Event Sequence Prediction.
BPM 2025 - 23rd International Conference on Business Process Management. Seville, Spain, Aug 31-Sep 05, 2025. To be published.

Abstract

In Predictive Process Monitoring, handling uncertainty regarding future case execution is the core building block for reliable predictive or prescriptive methods.In the last decade, deep learning methods are increasingly the preferred approach when it comes to Next Activity Prediction and/or Remaining Trace Prediction. However, it remains an open question whether deep learning models finally surpass traditional data mining techniques for these tasks. In our paper, we contribute to answering this question by proposing a sequence prediction framework based on bilaterally expanding hierarchical subtraces that serves as an alternative approach for currently established deep learning techniques. We mine sequential patterns from activity traces and arrange them into a hierarchical subtrace tree by their structural relationship and inter-pattern distances. The tree structure can directly be leveraged for forecasting the most probable future activities given the trace history. We achieve competitive forecasting results for Remaining Trace Prediction, even surpassing state-of-the-art deep learning approaches on the majority of the analyzed real-world benchmark process event logs while only relying on the available control-flow information.

MCML Authors

Simon Rauch

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Christian Frey

Dr.

* Former Member

Andrea Maldonado

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

[2056]

J. Blake and M. Schubert.
Aerial Coverage Path Planning in Nuclear Emergencies A Training and Evaluation Environment.
Demonstration Track @IJCAI 2025 - Demonstration Track at the 34th International Joint Conference on Artificial Intelligence (IJCAI 2025). Montreal, Canada, Aug 16-22, 2025. To be published.

Abstract

We formulate a Coverage Path Planning (CPP) problem for a helicopter or a UAV tasked with mapping ground-level radiation while avoiding radiation that is too strong. We introduce a simulation environment that incorporates digital elevation models, altitude-dependent measurement footprints and realistic flight constraints, as well as state-of-the-art radiation scenario simulations, such as nuclear explosions, provided by the German Federal Office for Radiation Protection. We highlight the complexity of radiological survey missions and demonstrate the necessity for new CPP approaches that address these unique challenges. The code to our simulation environment will be provided upon acceptance.

MCML Authors

Johann Blake

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[2055]

M. Windl, O. Akgul, N. Malkin and L. F. Cranor.
Privacy Solution or Menace? Investigating Perceptions of Radio-Frequency Sensing.
USENIX 2025 - 34th USENIX Security Symposium. Seattle, WA, USA, Aug 13-15, 2025. To be published.

Abstract

Radio-frequency sensors are often introduced as privacy-preserving alternatives to cameras, as they enable similar use cases without relying on visual data. However, researchers argue that radio-frequency sensors cause privacy risks similar to cameras and even introduce additional risks. We conducted in-depth interviews (N= 14) and a large-scale vignette survey (N= 510) to understand people’s perceptions and privacy concerns around radio-frequency sensing. Most interviewees were initially unaware of the full capabilities of radio-frequency sensing but expressed nuanced concerns upon learning more. Our survey revealed that, while people expressed concerns, they mostly preferred radio-frequency sensors over cameras in private locations. However, they preferred cameras when considering radio-frequency sensing from a neighbor’s perspective and in security-relevant situations. Protective measures can reduce concerns, but the best protection depends on the context. Our findings can inform educational and legislative efforts to ensure a privacy-preserving future with radio-frequency technology.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[2054]

T. Benoit, Y. Wang, M. Dannehl and J. Kinder.
BLens: Contrastive Captioning of Binary Functions using Ensemble Embedding.
USENIX 2025 - 34th USENIX Security Symposium. Seattle, WA, USA, Aug 13-15, 2025. To be published. Preprint available. PDF

Abstract

Function names can greatly aid human reverse engineers, which has spurred the development of machine learning-based approaches to predicting function names in stripped binaries. Much current work in this area now uses transformers, applying a metaphor of machine translation from code to function names. Still, function naming models face challenges in generalizing to projects unrelated to the training set. In this paper, we take a completely new approach by transferring advances in automated image captioning to the domain of binary reverse engineering, such that different parts of a binary function can be associated with parts of its name. We propose BLens, which combines multiple binary function embeddings into a new ensemble representation, aligns it with the name representation latent space via a contrastive learning approach, and generates function names with a transformer architecture tailored for function names. Our experiments demonstrate that BLens significantly outperforms the state of the art. In the usual setting of splitting per binary, we achieve an F1 score of 0.79 compared to 0.70. In the cross-project setting, which emphasizes generalizability, we achieve an F1 score of 0.46 compared to 0.29. Finally, in an experimental setting reducing shared components across projects, we achieve an F1 score of 0.32 compared to 0.19.

MCML Authors

Yunru Wang

Programming Languages and Artificial Intelligence

Moritz Dannehl

Programming Languages and Artificial Intelligence

Johannes Kinder

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Programming Languages and Artificial Intelligence

[2053]

Y. Ma, J. Schweisthal, H. Zhang and S. Feuerriegel.
A Diffusion-Based Method for Learning the Multi-Outcome Distribution of Medical Treatments.
KDD 2025 - 31st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Toronto, ON, Canada, Aug 03-07, 2025. To be published. Preprint available. arXiv

Abstract

In medicine, treatments often influence multiple, interdependent outcomes, such as primary endpoints, complications, adverse events, or other secondary endpoints. Hence, to make optimal treatment decisions, clinicians are interested in learning the distribution of multi-dimensional treatment outcomes. However, the vast majority of machine learning methods for predicting treatment effects focus on single-outcome settings, despite the fact that medical data often include multiple, interdependent outcomes. To address this limitation, we propose a novel diffusion-based method called DIME to learn the joint distribution of multiple outcomes of medical treatments. We addresses three challenges relevant in medical practice: (i)it is tailored to learn the joint interventional distribution of multiple medical outcomes, which enables reliable decision-making with uncertainty quantification rather than relying solely on point estimates; (ii)it explicitly captures the dependence structure between outcomes; (iii)it can handle outcomes of mixed type, including binary, categorical, and continuous variables. In DIME, we take into account the fundamental problem of causal inference through causal masking. For training, our method decomposes the joint distribution into a series of conditional distributions with a customized conditional masking to account for the dependence structure across outcomes. For inference, our method auto-regressively generates predictions. This allows our method to move beyond point estimates of causal quantities and thus learn the joint interventional distribution. To the best of our knowledge, DIME is the first neural method tailored to learn the joint, multi-outcome distribution of medical treatments. Across various experiments, we demonstrate that our method effectively learns the joint distribution and captures shared information among multiple outcomes.

MCML Authors

Yuchen Ma

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[2052]

Z. Ding, Y. Li, Y. He, A. Norelli, J. Wu, V. Tresp, Y. Ma and M. Bronstein.
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models.
TGL @KDD 2025 - Temporal Graph Learning Workshop at the 31st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2025). Toronto, ON, Canada, Aug 03-07, 2025. To be published. Preprint available. arXiv

Abstract

Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2) Meanwhile, more powerful models are needed to identify and select the most critical temporal information within the extended context provided by longer histories. To address these problems, we propose a CTDG representation learning model named DyGMamba, originating from the popular Mamba state space model (SSM). DyGMamba first leverages a node-level SSM to encode the sequence of historical node interactions. Another time-level SSM is then employed to exploit the temporal patterns hidden in the historical graph, where its output is used to dynamically select the critical information from the interaction history. We validate DyGMamba experimentally on the dynamic link prediction task. The results show that our model achieves state-of-the-art in most cases. DyGMamba also maintains high efficiency in terms of computational resources, making it possible to capture long temporal dependencies with a limited computation budget.

MCML Authors

Zifeng Ding

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Yunpu Ma

Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Database Systems and Data Mining

[2051]

D. Strieder and M. Drton.
Identifying total causal effects in linear models under partial homoscedasticity.
International Journal of Approximate Reasoning 183.109455 (Aug. 2025). DOI

Abstract

A fundamental challenge of scientific research is inferring causal relations based on observed data. One commonly used approach involves utilizing structural causal models that postulate noisy functional relations among interacting variables. A directed graph naturally represents these models and reflects the underlying causal structure. However, classical identifiability results suggest that, without conducting additional experiments, this causal graph can only be identified up to a Markov equivalence class of indistinguishable models. Recent research has shown that focusing on linear relations with equal error variances can enable the identification of the causal structure from mere observational data. Nonetheless, practitioners are often primarily interested in the effects of specific interventions, rendering the complete identification of the causal structure unnecessary. In this work, we investigate the extent to which less restrictive assumptions of partial homoscedasticity are sufficient for identifying the causal effects of interest. Furthermore, we construct mathematically rigorous confidence regions for total causal effects under structure uncertainty and explore the performance gain of relying on stricter error assumptions in a simulation study.

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Statistics

[2050]

A. Scagliotti and S. Farinelli.
Normalizing flows as approximations of optimal transport maps via linear-control neural ODEs.
Nonlinear Analysis 257.113811 (Aug. 2025). DOI

Abstract

In this paper, we consider the problem of recovering the W2-optimal transport map T between absolutely continuous measures as the flow of a linear-control neural ODE, where the control depends only on the time variable and takes values in a finite-dimensional space. We first show that, under suitable assumptions on and on the controlled vector fields governing the neural ODE, the optimal transport map is contained in the -closure of the flows generated by the system. Then, we tackle the problem under the assumption that only discrete approximations of of the original measures are available: we formulate approximated optimal control problems, and we show that their solutions give flows that approximate the original optimal transport map . In the framework of generative models, the approximating flow constructed here can be seen as a ‘Normalizing Flow’, which usually refers to the task of providing invertible transport maps between probability measures by means of deep neural networks. We propose an iterative numerical scheme based on the Pontryagin Maximum Principle for the resolution of the optimal control problem, resulting in a method for the practical computation of the approximated optimal transport map, and we test it on a two-dimensional example.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[2049]

S. Dirksen, W. Li and J. Maly.
Subspace estimation under coarse quantization.
SampTA 2025 - 15th International Conference on Sampling Theory and Applications. Vienna, Austria, Jul 28-Aug 01, 2025. To be published. Preprint available. URL

Abstract

We study subspace estimation from coarsly quantized data. In particular, we analyze two stochastic quantization schemes which use dithering: a one-bit quantizer combined with rectangular dither and a multi-bit quantizer with triangular dither. For each quantizer, we derive rigorous high probability bounds for the distances between the true and estimated signal subspaces. Using our analysis, we identify scenarios in which subspace estimation via triangular dithering qualitatively outperforms rectangular dithering. We verify in numerical simulations that our estimates are optimal in their dependence on the smallest non-zero eigenvalue of the target matrix.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[2048]

F. Krahmer, F. Pagginelli Patricio and P. Catala.
On a Recovery Method with Approximation Guarantees for Noisy Unlimited Sampling.
SampTA 2025 - 15th International Conference on Sampling Theory and Applications. Vienna, Austria, Jul 28-Aug 01, 2025. To be published. Preprint available. URL

Abstract

The unlimited sampling problem of recovering a bandlimited signal from measurements that are affected by a modulo operation has recently been addressed in a number of works employing different approaches. Many of these methods, however, are not robust to Gaussian noise, as local outliers can affect the global solution quality. In this talk we propose and analyze a method to address this challenge by locally optimizing the choice of the function representation among the many equivalent modulo representatives – separately for each sub-interval in a given subdivision of the domain. Our analysis reveals that a successful recovery requires a careful balance between two types of potential limitations. On the one hand, the feasibility of our least-squares retrieval strategy requires the amount of sub-intervals to be large enough, so that the input varies little inside each of them. On the other hand, we show that the conditioning of the resulting linear system matrix deteriorates for too many intervals. The study of this trade-off provides a first step towards the theoretical understanding of our proposed algorithm and a practical guidance for its implementation.

MCML Authors

Felix Krahmer

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Optimization & Data Analysis

[2047]

Y. Liu, H. Ye, C. Ma, M. Wang and H. Schütze.
LangSAMP: Language-Script Aware Multilingual Pretraining.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. Preprint. arXiv GitHub

Abstract

Recent multilingual pretrained language models (mPLMs) often avoid using language embeddings – learnable vectors assigned to different languages. These embeddings are discarded for two main reasons: (1) mPLMs are expected to have a single, unified parameter set across all languages, and (2) they need to function seamlessly as universal text encoders without requiring language IDs as input. However, this removal increases the burden on token embeddings to encode all language-specific information, which may hinder the model’s ability to produce more language-neutral representations. To address this challenge, we propose Language-Script Aware Multilingual Pretraining (LangSAMP), a method that incorporates both language and script embeddings to enhance representation learning while maintaining a simple architecture. Specifically, we integrate these embeddings into the output of the transformer blocks before passing the final representations to the language modeling head for prediction. We apply LangSAMP to the continual pretraining of XLM-R on a highly multilingual corpus covering more than 500 languages. The resulting model consistently outperforms the baseline. Extensive analysis further shows that language/script embeddings encode language/script-specific information, which improves the selection of source languages for crosslingual transfer.

MCML Authors

Yihong Liu

Computational Linguistics

Haotian Ye

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Chunlan Ma

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Mingyang Wang

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computational Linguistics

[2046]

A. Bavaresco, R. Bernardi, L. Bertolazzi, D. Elliott, R. Fernández, A. Gatt, E. Ghaleb, M. Giulianelli, M. Hanna, A. Koller, A. F. T. Martins, P. Mondorf, V. Neplenbroek, S. Pezzelle, B. Plank, D. Schlangen, A. Suglia, A. K. S. Aditya K. Surikuchi, E. Takmaz and A. Testoni.
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

There is an increasing trend towards evaluating NLP models with LLM-generated judgments instead of human judgments. In the absence of a comparison against human data, this raises concerns about the validity of these evaluations; in case they are conducted with proprietary models, this also raises concerns over reproducibility. We provide JUDGE-BENCH, a collection of 20 NLP datasets with human annotations, and comprehensively evaluate 11 current LLMs, covering both open-weight and proprietary models, for their ability to replicate the annotations. Our evaluations show that each LLM exhibits a large variance across datasets in its correlation to human judgments. We conclude that LLMs are not yet ready to systematically replace human judges in NLP.

MCML Authors

Philipp Mondorf

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[2045]

J. Bi, Y. Wang, H. Chen, X. Xiao, A. Hecker, V. Tresp and Y. Ma.
LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Multimodal Large Language Models (MLLMs) have significantly advanced visual tasks by integrating visual representations into large language models (LLMs). The textual modality, inherited from LLMs, equips MLLMs with abilities like instruction following and in-context learning. In contrast, the visual modality enhances performance in downstream tasks by leveraging rich semantic content, spatial information, and grounding capabilities. These intrinsic modalities work synergistically across various visual tasks. Our research initially reveals a persistent imbalance between these modalities, with text often dominating output generation during visual instruction tuning. This imbalance occurs when using both full fine-tuning and parameter-efficient fine-tuning (PEFT) methods. We then found that re-balancing these modalities can significantly reduce the number of trainable parameters required, inspiring a direction for further optimizing visual instruction tuning. We introduce Modality Linear Representation-Steering (MoReS) to achieve the goal. MoReS effectively re-balances the intrinsic modalities throughout the model, where the key idea is to steer visual representations through linear transformations in the visual subspace across each model layer. To validate our solution, we composed LLaVA Steering, a suite of models integrated with the proposed MoReS method. Evaluation results show that the composed LLaVA Steering models require, on average, 500 times fewer trainable parameters than LoRA needs while still achieving comparable performance across three visual benchmarks and eight visual question-answering tasks. Last, we present the LLaVA Steering Factory, an in-house developed platform that enables researchers to quickly customize various MLLMs with component-based architecture for seamlessly integrating state-of-the-art models, and evaluate their intrinsic modality imbalance.

MCML Authors

Haokun Chen

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Yunpu Ma

Dr.

B2 | Natural Language Processing
→ Group Michael Hedderich

Database Systems and Data Mining

[2044]

F. Eichin, Y. J. Liu, B. Plank and M. A. Hedderich.
Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Discourse understanding is essential for many NLP tasks, yet most existing work remains constrained by framework-dependent discourse representations. This work investigates whether large language models (LLMs) capture discourse knowledge that generalizes across languages and frameworks. We address this question along two dimensions: (1) developing a unified discourse relation label set to facilitate cross-lingual and cross-framework discourse analysis, and (2) probing LLMs to assess whether they encode generalizable discourse abstractions. Using multilingual discourse relation classification as a testbed, we examine a comprehensive set of 23 LLMs of varying sizes and multilingual capabilities. Our results show that LLMs, especially those with multilingual training corpora, can generalize discourse information across languages and frameworks. Further layer-wise analyses reveal that language generalization at the discourse level is most salient in the intermediate layers. Lastly, our error analysis provides an account of challenging relation classes.

MCML Authors

Florian Eichin

AI and Computational Linguistics

Yang Janet Liu

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Michael Hedderich

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[2043]

M. Fayyaz, A. Modarressi, H. Schütze and N. Peng.
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Dense retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG). Since they often serve as the first step in these systems, their robustness is critical to avoid failures. In this work, by repurposing a relation extraction dataset (e.g. Re-DocRED), we design controlled experiments to quantify the impact of heuristic biases, such as favoring shorter documents, in retrievers like Dragon+ and Contriever. Our findings reveal significant vulnerabilities: retrievers often rely on superficial patterns like over-prioritizing document beginnings, shorter documents, repeated entities, and literal matches. Additionally, they tend to overlook whether the document contains the query’s answer, lacking deep semantic understanding. Notably, when multiple biases combine, models exhibit catastrophic performance degradation, selecting the answer-containing document in less than 3% of cases over a biased document without the answer. Furthermore, we show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs, resulting in a 34% performance drop than not providing any documents at all.

MCML Authors

Ali Modarressi

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Computational Linguistics

[2042]

F. Friedrich, K. Hämmerl, P. Schramowski, M. Brack, J. Libovicky, K. Kersting and A. Fraser.
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Text-to-image generation models have recently achieved astonishing results in image quality, flexibility, and text alignment, and are consequently employed in a fast-growing number of applications. Through improvements in multilingual abilities, a larger community now has access to this technology. However, our results show that multilingual models suffer from significant gender biases just as monolingual models do. Furthermore, the natural expectation that multilingual models will provide similar results across languages does not hold up. Instead, there are important differences between languages. We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models. We use MAGBIG to investigate the effect of multilingualism on gender bias in T2I models. To this end, we construct multilingual prompts requesting portraits of people with a certain occupation or trait. Our results show that not only do models exhibit strong gender biases but they also behave differently across languages. Furthermore, we investigate prompt engineering strategies, such as indirect, neutral formulations, to mitigate these biases. Unfortunately, these approaches have limited success and result in worse text-to-image alignment. Consequently, we call for more research into diverse representations across languages in image generators, as well as into steerability to address biased model behavior.

MCML Authors

Katharina Hämmerl

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[2041]

M. A. Hedderich, A. Wang, R. Zhao, F. Eichin, J. Fischer and B. Plank.
What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Prompt engineering for large language models is challenging, as even small prompt perturbations or model changes can significantly impact the generated output texts. Existing evaluation methods, either automated metrics or human evaluation, have limitations, such as providing limited insights or being labor-intensive. We propose Spotlight, a new approach that combines both automation and human analysis. Based on data mining techniques, we automatically distinguish between random (decoding) variations and systematic differences in language model outputs. This process provides token patterns that describe the systematic differences and guide the user in manually analyzing the effects of their prompt and model changes efficiently. We create three benchmarks to quantitatively test the reliability of token pattern extraction methods and demonstrate that our approach provides new insights into established prompt data. From a human-centric perspective, through demonstration studies and a user study, we show that our token pattern approach helps users understand the systematic differences of language model outputs, and we are able to discover relevant differences caused by prompt and model changes (e.g. related to gender or culture), thus supporting the prompt engineering process and human-centric model behavior research.

MCML Authors

Michael Hedderich

Dr.

B2 | Natural Language Processing
→ Group Michael Hedderich

AI and Computational Linguistics

Raoyuan Zhao

AI and Computational Linguistics

Florian Eichin

B2 | Natural Language Processing
→ Group Michael Hedderich

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

AI and Computational Linguistics

[2040]

B. Ma, Y. Li, W. Zhou, Z. Gong, Y. J. Liu, K. Jasinskaja, A. Friedrich, J. Hirschberg, F. Kreuter and B. Plank.
Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Understanding pragmatics-the use of language in context-is crucial for developing NLP systems capable of interpreting nuanced language use. Despite recent advances in language technologies, including large language models, evaluating their ability to handle pragmatic phenomena such as implicatures and references remains challenging. To advance pragmatic abilities in models, it is essential to understand current evaluation trends and identify existing limitations. In this survey, we provide a comprehensive review of resources designed for evaluating pragmatic capabilities in NLP, categorizing datasets by the pragmatics phenomena they address. We analyze task designs, data collection methods, evaluation approaches, and their relevance to real-world applications. By examining these resources in the context of modern language models, we highlight emerging trends, challenges, and gaps in existing benchmarks. Our survey aims to clarify the landscape of pragmatic evaluation and guide the development of more comprehensive and targeted benchmarks, ultimately contributing to more nuanced and context-aware NLP models.

MCML Authors

Bolei Ma

Social Data Science and AI

Yang Janet Liu

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Barbara Plank

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

AI and Computational Linguistics

[2039]

B. Ma, B. Yoztyurk, A.-C. Haensch, X. Wang, M. Herklotz, F. Kreuter, B. Plank and M. Aßenmacher.
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

In recent research, large language models (LLMs) have been increasingly used to investigate public opinions. This study investigates the algorithmic fidelity of LLMs, i.e., the ability to replicate the socio-cultural context and nuanced opinions of human participants. Using open-ended survey data from the German Longitudinal Election Studies (GLES), we prompt different LLMs to generate synthetic public opinions reflective of German subpopulations by incorporating demographic features into the persona prompts. Our results show that Llama performs better than other LLMs at representing subpopulations, particularly when there is lower opinion diversity within those groups. Our findings further reveal that the LLM performs better for supporters of left-leaning parties like The Greens and The Left compared to other parties, and matches the least with the right-party AfD. Additionally, the inclusion or exclusion of specific variables in the prompts can significantly impact the models’ predictions. These findings underscore the importance of aligning LLMs to more effectively model diverse public opinions while minimizing political biases and enhancing robustness in representativeness.

MCML Authors

Bolei Ma

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Xinpeng Wang

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Barbara Plank

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

AI and Computational Linguistics

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[2038]

P. Mondorf, S. Wold and B. Plank.
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions via subnetworks that can be composed to perform more complex tasks. Recent developments in mechanistic interpretability have made progress in identifying subnetworks, often referred to as circuits, which represent the minimal computational subgraph responsible for a model’s behavior on specific tasks. However, most studies focus on identifying circuits for individual tasks without investigating how functionally similar circuits relate to each other. To address this gap, we examine the modularity of neural networks by analyzing circuits for highly compositional subtasks within a transformer-based language model. Specifically, given a probabilistic context-free grammar, we identify and compare circuits responsible for ten modular string-edit operations. Our results indicate that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness. Moreover, we demonstrate that the circuits identified can be reused and combined through subnetwork set operations to represent more complex functional capabilities of the model.

MCML Authors

Philipp Mondorf

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[2037]

E. Nie, B. Shao, Z. Ding, M. Wang, H. Schmid and H. Schütze.
BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Large language models (LLMs) possess extensive parametric knowledge, but this knowledge is difficult to update with new information because retraining is very expensive and infeasible for closed-source models. Knowledge editing (KE) has emerged as a viable solution for updating the knowledge of LLMs without compromising their overall performance. On-the-fly KE methods, inspired by in-context learning (ICL), have shown great promise and allow LLMs to be treated as black boxes. In the past, KE was primarily employed in English contexts, whereas the potential for cross-lingual KE in current English-centric LLMs has not been fully explored. To foster more research in this direction, we introduce the BMIKE-53 benchmark for evaluating cross-lingual KE on 53 diverse languages across three KE task types. We also propose a gradient-free KE method called Multilingual In-context Knowledge Editing (MIKE) and evaluate it on BMIKE-53. Our evaluation focuses on cross-lingual knowledge transfer in terms of reliability, generality, locality, and portability, offering valuable insights and a framework for future research in cross-lingual KE.

MCML Authors

Ercong Nie

Computational Linguistics

Zifeng Ding

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining

Mingyang Wang

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[2036]

R. Pei, Y. Liu, P. Lin, F. Yvon and H. Schütze.
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

In-context machine translation (MT) with large language models (LLMs) is a promising approach for low-resource MT, as it can readily take advantage of linguistic resources such as grammar books and dictionaries. Such resources are usually selectively integrated into the prompt so that LLMs can directly perform translation without any specific training, via their in-context learning capability (ICL). However, the relative importance of each type of resource e.g., dictionary, grammar book, and retrieved parallel examples, is not entirely clear. To address this gap, this study systematically investigates how each resource and its quality affects the translation performance, with the Manchu language as our case study. To remove any prior knowledge of Manchu encoded in the LLM parameters and single out the effect of ICL, we also experiment with an encrypted version of Manchu texts. Our results indicate that high-quality dictionaries and good parallel examples are very helpful, while grammars hardly help. In a follow-up study, we showcase a promising application of in-context MT: parallel data augmentation as a way to bootstrap the conventional MT model. When monolingual data abound, generating synthetic parallel data through in-context MT offers a pathway to mitigate data scarcity and build effective and efficient low-resource neural MT systems.

MCML Authors

Yihong Liu

Computational Linguistics

Peiqin Lin

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[2035]

M. Wang, H. Adel, L. Lange, Y. Liu, E. Nie, J. Strötgen and H. Schütze.
Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Multilingual language models (MLMs) store factual knowledge across languages but often struggle to provide consistent responses to semantically equivalent prompts in different languages. While previous studies point out this cross-lingual inconsistency issue, the underlying causes remain unexplored. In this work, we use mechanistic interpretability methods to investigate cross-lingual inconsistencies in MLMs. We find that MLMs encode knowledge in a language-independent concept space through most layers, and only transition to language-specific spaces in the final layers. Failures during the language transition often result in incorrect predictions in the target language, even when the answers are correct in other languages. To mitigate this inconsistency issue, we propose a linear shortcut method that bypasses computations in the final layers, enhancing both prediction accuracy and cross-lingual consistency. Our findings shed light on the internal mechanisms of MLMs and provide a lightweight, effective strategy for producing more consistent factual outputs.

MCML Authors

Mingyang Wang

Computational Linguistics

Yihong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computational Linguistics

[2034]

B. Chen, S. Peng, A. Korhonen and B. Plank.
A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI.
ACL 2025 - Findings of the 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Disagreement in human labeling is ubiquitous, and can be captured in human judgment distributions (HJDs). Recent research has shown that explanations provide valuable information for understanding human label variation (HLV) and large language models (LLMs) can approximate HJD from a few human-provided label-explanation pairs. However, collecting explanations for every label is still time-consuming. This paper examines whether LLMs can be used to replace humans in generating explanations for approximating HJD. Specifically, we use LLMs as annotators to generate model explanations for a few given human labels. We test ways to obtain and combine these label-explanations with the goal to approximate human judgment distribution. We further compare the resulting human with model-generated explanations, and test automatic and human explanation selection. Our experiments show that LLM explanations are promising for NLI: to estimate HJD, generated explanations yield comparable results to human’s when provided with human labels. Importantly, our results generalize from datasets with human explanations to i) datasets where they are not available and ii) challenging out-of-distribution test sets.

MCML Authors

Beiduo Chen

AI and Computational Linguistics

Siyao Peng

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

AI and Computational Linguistics

[2033]

L. Edman, H. Schmid and A. Fraser.
EXECUTE: A Multilingual Benchmark for LLM Token Understanding.
ACL 2025 - Findings of the 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

The CUTE benchmark showed that LLMs struggle with character understanding in English. We extend it to more languages with diverse scripts and writing systems, introducing EXECUTE. Our simplified framework allows easy expansion to any language. Tests across multiple LLMs reveal that challenges in other languages are not always on the character level as in English. Some languages show word-level processing issues, some show no issues at all. We also examine sub-character tasks in Chinese, Japanese, and Korean to assess LLMs’ understanding of character components.

MCML Authors

Lukas Edman

Dr.

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Data Analytics & Statistics

[2032]

A. D. Hakimi, A. Modarressi, P. Wicke and H. Schütze.
Time Course MechInterp: Analyzing the Evolution of Components and Knowledge in Large Language Models.
ACL 2025 - Findings of the 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Understanding how large language models (LLMs) acquire and store factual knowledge is crucial for enhancing their interpretability and reliability. In this work, we analyze the evolution of factual knowledge representation in the OLMo-7B model by tracking the roles of its attention heads and feed forward networks (FFNs) over the course of pre-training. We classify these components into four roles: general, entity, relation-answer, and fact-answer specific, and examine their stability and transitions. Our results show that LLMs initially depend on broad, general-purpose components, which later specialize as training progresses. Once the model reliably predicts answers, some components are repurposed, suggesting an adaptive learning process. Notably, attention heads display the highest turnover. We also present evidence that FFNs remain more stable throughout training. Furthermore, our probing experiments reveal that location-based relations converge to high accuracy earlier in training than name-based relations, highlighting how task complexity shapes acquisition dynamics. These insights offer a mechanistic view of knowledge formation in LLMs.

MCML Authors

Ahmad Dawar Hakimi

Computational Linguistics

Ali Modarressi

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Philipp Wicke

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[2031]

L. He, E. Nie, H. Schmid, H. Schütze, N. Mesgarani and J. Brennan.
Large Language Models as Neurolinguistic Subjects: Discrepancy in Performance and Competence for Form and Meaning.
ACL 2025 - Findings of the 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning) by distinguishing two LLM evaluation paradigms: psycholinguistic and neurolinguistic. Traditional psycholinguistic evaluations often reflect statistical biases that may misrepresent LLMs’ true linguistic capabilities. We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers. This method allows for a detailed examination of how LLMs represent form and meaning, and whether these representations are consistent across languages. Our contributions are three-fold: (1) We compare neurolinguistic and psycholinguistic methods, revealing distinct patterns in LLM assessment; (2) We demonstrate that LLMs exhibit higher competence in form compared to meaning, with the latter largely correlated to the former; (3) We present new conceptual minimal pair datasets for Chinese (COMPS-ZH) and German (COMPS-DE), complementing existing English datasets.

MCML Authors

Ercong Nie

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[2030]

A. H. Kargaran, Y. Liu, F. Yvon and H. Schütze.
How Programming Concepts and Neurons Are Shared in Code Language Models.
ACL 2025 - Findings of the 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Several studies have explored the mechanisms of large language models (LLMs) in coding tasks, but most have focused on programming languages (PLs) in a monolingual setting. In this paper, we investigate the relationship between multiple PLs and English in the concept space of LLMs. We perform a few-shot translation task on 21 PL pairs using two Llama-based models. By decoding the embeddings of intermediate layers during this task, we observe that the concept space is closer to English (including PL keywords) and assigns high probabilities to English tokens in the second half of the intermediate layers. We analyze neuron activations for 11 PLs and English, finding that while language-specific neurons are primarily concentrated in the bottom layers, those exclusive to each PL tend to appear in the top layers. For PLs that are highly aligned with multiple other PLs, identifying language-specific neurons is not feasible. These PLs also tend to have a larger keyword set than other PLs and are closer to the model’s concept space regardless of the input/output PL in the translation task. Our findings provide insights into how LLMs internally represent PLs, revealing structural patterns in the model’s concept space.

MCML Authors

Amir Hossein Kargaran

Computational Linguistics

Yihong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[2029]

A. H. Kargaran, A. Modarressi, N. Nikeghbal, J. Diesner, F. Yvon and H. Schütze.
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment.
ACL 2025 - Findings of the 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

English-centric large language models (LLMs) often show strong multilingual capabilities. However, the multilingual performance of these models remains unclear and is not thoroughly evaluated for many languages. Most benchmarks for multilinguality focus on classic NLP tasks, or cover a minimal number of languages. We introduce MEXA, a method for assessing the multilingual capabilities of pre-trained English-centric LLMs using parallel sentences, which are available for more languages than existing downstream tasks. MEXA leverages the fact that English-centric LLMs use English as a kind of pivot language in their intermediate layers. It computes the alignment between English and non-English languages using parallel sentences to evaluate the transfer of language understanding from English to other languages. This alignment can be used to estimate model performance in other languages. We conduct studies using various parallel datasets (FLORES-200 and Bible), models (Llama family, Gemma family, Mistral, and OLMo), and established downstream tasks (Belebele, m-MMLU, and m-ARC). We explore different methods to compute embeddings in decoder-only models. Our results show that MEXA, in its default settings, achieves a statistically significant average Pearson correlation of 0.90 with three established downstream tasks across nine models and two parallel datasets. This suggests that MEXA is a reliable method for estimating the multilingual capabilities of English-centric LLMs, providing a clearer understanding of their multilingual potential and the inner workings of LLMs.

MCML Authors

Amir Hossein Kargaran

Computational Linguistics

Ali Modarressi

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[2028]

S. Urchs, V. Thurner, M. Aßenmacher, C. Heumann and S. Thiemichen.
taz2024full: Analysing German Newspapers for Gender Bias and Discrimination across Decades.
ACL 2025 - Findings of the 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available.

Abstract

Open-access corpora are essential for advancing natural language processing (NLP) and computational social science (CSS). However, large-scale resources for German remain limited, restricting research on linguistic trends and societal issues such as gender bias. We present taz2024full, the largest publicly available corpus of German newspaper articles to date, comprising over 1.8 million texts from taz, spanning 1980 to 2024. As a demonstration of the corpus’s utility for bias and discrimination research, we analyse gender representation across four decades of reporting. We find a consistent overrepresentation of men, but also a gradual shift toward more balanced coverage in recent years. Using a scalable, structured analysis pipeline, we provide a foundation for studying actor mentions, sentiment, and linguistic framing in German journalistic texts. The corpus supports a wide range of applications, from diachronic language analysis to critical media studies, and is freely available to foster inclusive and reproducible research in German-language NLP.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[2027]

I. Bueno, A. Bavaresco, J. M. Cunha and P. Wicke.
Analogy Prompting: Testing Spatial Intuitions of Humans and Multimodal Models in Analogies.
Analogy-Angle II @ACL 2025 - 2nd Workshop on Analogical Abstraction in Cognition, Perception, and Language at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. URL

Abstract

Language and Vision-Language Models exhibit impressive language capabilities akin to human reasoning. However, unlike humans who acquire language through embodied, interactive experiences, these models learn from static datasets without real-world interaction. This difference raises questions about how they conceptualize abstract notions and whether their reasoning aligns with human cognition. We investigate spatial conceptualizations of LLMs and VLMs by conducting analogy prompting studies with LLMs, VLMs, and human participants. We assess their ability to generate and interpret analogies for spatial concepts. We quantitatively compare the analogies produced by each group, examining the impact of multimodal inputs and reasoning mechanisms. Our findings indicate that generative models can produce and interpret analogies but differ significantly from human reasoning in their abstraction of spatial concepts - variability influenced by input modality, model size, and prompting methods, with analogy-based prompts not consistently enhancing alignment. Contributions include a methodology for probing generative models through analogies; a comparative analysis of analogical reasoning among models, and humans; and insights into the effect of multimodal inputs on reasoning.

MCML Authors

Philipp Wicke

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[2026]

A. Säuberli, D. Frassinelli and B. Plank.
Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?
BEA @ACL 2025 - 20th Workshop on Innovative Use of NLP for Building Educational Applications at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Knowing how test takers answer items in educational assessments is essential for test development, to evaluate item quality, and to improve test validity. However, this process usually requires extensive pilot studies with human participants. If large language models (LLMs) exhibit human-like response behavior to test items, this could open up the possibility of using them as pilot participants to accelerate test development. In this paper, we evaluate the human-likeness or psychometric plausibility of responses from 18 instruction-tuned LLMs with two publicly available datasets of multiple-choice test items across three subjects: reading, U.S. history, and economics. Our methodology builds on two theoretical frameworks from psychometrics which are commonly used in educational assessment, classical test theory and item response theory. The results show that while larger models are excessively confident, their response distributions can be more human-like when calibrated with temperature scaling. In addition, we find that LLMs tend to correlate better with humans in reading comprehension items compared to other subjects. However, the correlations are not very strong overall, indicating that LLMs should not be used for piloting educational assessments in a zero-shot setting.

MCML Authors

Andreas Säuberli

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

AI and Computational Linguistics

[2025]

E. Garces Arias, H. Blocher, J. Rodemann, M. Li, C. Heumann and M. Aßenmacher.
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework.
GEM2 @ACL 2025 - 4th Workshop on Generation, Evaluation and Metrics at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Open-ended text generation has become a prominent task in natural language processing due to the rise of powerful (large) language models. However, evaluating the quality of these models and the employed decoding strategies remains challenging because of trade-offs among widely used metrics such as coherence, diversity, and perplexity. Decoding methods often excel in some metrics while underperforming in others, complicating the establishment of a clear ranking. In this paper, we present novel ranking strategies within this multicriteria framework. Specifically, we employ benchmarking approaches based on partial orderings and present a new summary metric designed to balance existing automatic indicators, providing a more holistic evaluation of text generation quality. Furthermore, we discuss the alignment of these approaches with human judgments. Our experiments demonstrate that the proposed methods offer a robust way to compare decoding strategies, exhibit similarities with human preferences, and serve as valuable tools in guiding model selection for open-ended text generation tasks. Finally, we suggest future directions for improving evaluation methodologies in text generation. Our codebase, datasets, and models are publicly available.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[2024]

O. Kononykhina, A.-C. Haensch and F. Kreuter.
How Much Can Stratification Improve the Approximation of Shapley Values?
GeBNLP @ACL 2025 - 6th Workshop on Gender Bias in Natural Language Processing at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Vienna, Austria, Jul 27-Aug 01, 2025. To be published.

Abstract

Large Language Models (LLMs) offer promising alternatives to traditional occupational coding approaches in survey research. Using a German dataset, we examine the extent to which LLM-based occupational coding differs by gender. Our findings reveal systematic disparities: gendered job titles (e.g., “Autor” vs. “Autorin”, meaning “male author” vs. “female author”) frequently result in diverging occupation codes,
even when semantically identical. Across all models, 54%–82% of gendered inputs obtain different Top-5 suggestions. The practical impact, however, depends on the model. GPT includes the correct code most often (62%) but demonstrates female bias (up to +18 pp). IBM is less accurate (51%) but largely balanced. Alibaba, Gemini, and MiniLM achieve about 50% correct-code inclusion, and their small (< 10 pp) and direction-flipping gaps could indicate a sampling noise rather than gender bias. We discuss these findings in the context of fairness and reproducibility in NLP applications for social data.

MCML Authors

Olga Kononykhina

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[2023]

T. Lindenbauer, G. Groh and H. Schütze.
From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents.
REALM @ACL 2025 - 1st Workshop for Research on Agent Language Models at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

We introduce CTIM-Rover, an AI agent for Software Engineering (SE) built on top of AutoCodeRover (Zhang et al., 2024) that extends agentic reasoning frameworks with an episodic memory, more specifically, a general and repository-level Cross-Task-Instance Memory (CTIM). While existing open-source SE agents mostly rely on ReAct (Yao et al., 2023b), Reflexion (Shinn et al., 2023), or Code-Act (Wang et al., 2024), all of these reasoning and planning frameworks inefficiently discard their long-term memory after a single task instance. As repository-level understanding is pivotal for identifying all locations requiring a patch for fixing a bug, we hypothesize that SE is particularly well positioned to benefit from CTIM. For this, we build on the Experiential Learning (EL) approach ExpeL (Zhao et al., 2024), proposing a Mixture-Of-Experts (MoEs) inspired approach to create both a general-purpose and repository-level CTIM. We find that CTIM-Rover does not outperform AutoCodeRover in any configuration and thus conclude that neither ExpeL nor DoT-Bank (Lingam et al., 2024) scale to real-world SE problems. Our analysis indicates noise introduced by distracting CTIM items or exemplar trajectories as the likely source of the performance degradation.

MCML Authors

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[2022]

Q. Feng, Y. Liu and H. Schütze.
Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding.
SRW @ACL 2025 - Student Research Workshop at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. URL

Abstract

Curriculum learning is a widely adopted training strategy in natural language processing (NLP), where models are exposed to examples organized by increasing difficulty to enhance learning efficiency and performance. However, most existing approaches rely on manually defined difficulty metrics – such as text length – which may not accurately reflect the model’s own perspective. To overcome this limitation, we present a self-adaptive curriculum learning paradigm that prioritizes fine-tuning examples based on difficulty scores predicted by pre-trained language models (PLMs) themselves. Building on these scores, we explore various training strategies that differ in the ordering of examples for the fine-tuning: from easy-to-hard, hard-to-easy, to mixed sampling. We evaluate our method on four natural language understanding (NLU) datasets covering both binary and multi-class classification tasks. Experimental results show that our approach leads to faster convergence and improved performance compared to standard random sampling.

MCML Authors

Yihong Liu

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[2021]

M. Koshil, M. Feurer and K. Eggensperger.
In-Context Learning of Soft Nearest Neighbor Classifiers for Intelligible Tabular Machine Learning.
TRL @ACL 2025 - 4th Table Representation Learning Workshop at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. URL

Abstract

With in-context learning foundation models like TabPFN excelling on small supervised tabular learning tasks, it has been argued that ‘boosted trees are not the best default choice when working with data in tables’. However, such foundation models are inherently black-box models that do not provide interpretable predictions. We introduce a novel learning task to train ICL models to act as a nearest neighbor algorithm, which enables intelligible inference and does not decrease performance empirically.

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

[2020]

J. Hanselle, A. Javanmardi, T. Oberkofler, Y. Sale and E. Hüllermeier.
Conformal Prediction without Nonconformity Scores.
UAI 2025 - 41st Conference on Uncertainty in Artificial Intelligence. Rio de Janeiro, Brazil, Jul 21-25, 2025. To be published.

Abstract

Conformal prediction (CP) is an uncertainty quantification framework that allows for constructing
statistically valid prediction sets. Key to the construction of these sets is the notion of nonconformity function, which assigns a real-valued score to individual data points: only those (hypothetical) data points contribute to a prediction set that sufficiently conform to the data. The point of departure of this work is the observation that CP predictions are invariant against (strictly) monotone transformations of a nonconformity function. In other words, it is only the ordering of the scores that matters, not their quantitative values. Consequently, instead of scoring individual data points, a conformal predictor only needs to be able to compare pairs of data points, deciding which of them is the more conforming one. This suggests an interesting connection between CP and preference learning, in particular learning-to-rank methods, and makes CP amenable to training data in the form of (qualitative) preferences. Elaborating on
this connection, we propose methods for learning (latent) nonconformity functions from data of that
kind and show their usefulness in real-world classification tasks.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Tobias Oberkofler

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[2019]

M. Drton, M. Garrote-López, N. Nikov, E. Robeva and Y. S. Wang.
Causal Discovery for Linear Non-Gaussian Models with Disjoint Cycles.
UAI 2025 - 41st Conference on Uncertainty in Artificial Intelligence. Rio de Janeiro, Brazil, Jul 21-25, 2025. To be published. Preprint available. URL GitHub

Abstract

The paradigm of linear structural equation modeling readily allows one to incorporate causal feedback loops in the model specification. These appear as directed cycles in the common graphical representation of the models. However, the presence of cycles entails difficulties such as the fact that models need no longer be characterized by conditional independence relations. As a result, learning cyclic causal structures remains a challenging problem. In this paper, we offer new insights on this problem in the context of linear non-Gaussian models. First, we precisely characterize when two directed graphs determine the same linear non-Gaussian model. Next, we take up a setting of cycle-disjoint graphs, for which we are able to show that simple quadratic and cubic polynomial relations among low-order moments of a non-Gaussian distribution allow one to locate source cycles. Complementing this with a strategy of decorrelating cycles and multivariate regression allows one to infer a block-topological order among the directed cycles, which leads to a consistent and computationally efficient algorithm for learning causal structures with disjoint cycles.

MCML Authors

Mathias Drton

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Mathematical Statistics

[2018]

O. Kononykhina and M. Schierholz.
Can Large Language Models Advance Occupational Coding? Evidence and Methodological Insights.
ESRA 2025 - 11th Conference of the European Survey Research Association. Utrecht, The Netherlands, Jul 14-18, 2025. To be published.

Abstract

Occupational coding is a critical funnel between open-ended job descriptions and the statistical frameworks that shape employment research and policies. Automatic coding tools—whether rule-based or machine learning (ML)—have streamlined the process, and demonstrate promising results. Yet, ML approaches typically require extensive, high-quality training data that exceed what a typical national survey can provide and fall under data protection constraints. This study asks whether mainstream large language models (LLMs) can serve as a viable alternative, largely bypassing the need for exhaustive training data and requiring only some coding skills and API access. We created embeddings for standardized German (Kldb) job descriptions, then used respondents’ own words (e.g., “doctor”) from a representative German survey to generate job embeddings. Cosine similarity was applied to find the five most likely occupational codes for each response. To assess performance, we compared LLM-based suggestions with those from a German ML occupational coding tool (OccuCoDe), using professional manual coding as our benchmark. Results show that in 55% of the cases, both LLM and OccuCoDe included the correct code among their top five suggestions. However, there was limited overlap: in 60% of the cases, the two tools shared at most two out of their five recommended codes. While OccuCoDe more frequently placed the correct code as the first suggestion, LLM-embeddings suggested the correct occupation in 45% of cases where OccuCoDe did not provide any result. Additionally, LLM performance was sensitive to minor changes in job descriptions (e.g., capitalisation or gendered job titles) and sometimes showed “embedding drift,” raising reproducibility concerns. Our findings highlight LLMs’ promise as a complement or substitute to other tools for occupational coding in limited training data contexts, while underscoring critical limitations that must be addressed before fully entrusting them with classifying the work we do.

MCML Authors

Olga Kononykhina

Social Data Science and AI

Malte Schierholz

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[2017]

F. Kiwitt, B. Tahmasebi and S. Jegelka.
Symmetries in Weight Space Learning: To Retain or Remove?
HiLD @ICML 2025 - Workshop on High-dimensional Learning Dynamics at the 42nd International Conference on Machine Learning (ICML 2025). Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

Weight space learning, an emerging paradigm that seeks to understand neural networks through their space of parameters (weights), has shown promise in a variety of applications, including but not limited to predicting model behavior and addressing privacy concerns. However, weight spaces often exhibit inherent symmetries that impact both theory and practice, such as the scale and rotational invariances found in the Low-Rank Adaptation (LoRA) method, which is the state-of-the-art fine-tuning algorithm for Large Language Models (LLMs). In this work, we investigate a general weight space learning problem under symmetries, focusing on a fundamental question: What is the appropriate formulation for this problem in the presence of symmetries (such as those in LoRA), and should redundant representations that encode the same end-to-end function be removed? We address this question by fully characterizing a new space of symmetric weights, demonstrating that the relevance of redundancy depends on the function being predicted. Specifically, we show that end-to-end symmetries (such as those in LoRA) should not always be removed, as doing so may compromise the universality of the weight space learning problem. To our knowledge, this is the first time this phenomenon has been formally identified and presented, yielding insights into a broad class of weight space learning problems.

MCML Authors

Stefanie Jegelka

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Foundations of Deep Neural Networks

[2016]

J. von Berg, A. Fono, M. Datres, S. Maskey and G. Kutyniok.
The Price of Robustness: Stable Classifiers Need Overparameterization.
HiLD @ICML 2025 - Workshop on High-dimensional Learning Dynamics at the 42nd International Conference on Machine Learning (ICML 2025). Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

In this work, we show that class stability, the expected distance of an input to the decision boundary, captures what classical capacity measures, such as weight norms, fail to explain. We prove a generalization bound that improves inversely with the class stability, interpreted as a quantifiable notion of robustness. As a corollary, we derive a law of robustness for classification: any interpolating model with parameters must be unstable, so high stability requires significant overparameterization. Crucially, our results extend beyond smoothness assumptions and apply to discontinuous classifiers. Preliminary experiments support our theory: empirical stability increases with model size, while norm-based measures remain uninformative.

MCML Authors

Jonas von Berg

Mathematical Foundations of Artificial Intelligence

Adalbert Fono

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Sohir Maskey

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[2015]

S. Müller, A. Reuter, N. Hollmann, D. Rügamer and F. Hutter.
Position: The Future of Bayesian Prediction Is Prior-Fitted.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. Preprint. arXiv

Abstract

Training neural networks on randomly generated artificial datasets yields Bayesian models that capture the prior defined by the dataset-generating distribution. Prior-data Fitted Networks (PFNs) are a class of methods designed to leverage this insight. In an era of rapidly increasing computational resources for pre-training and a near stagnation in the generation of new real-world data in many applications, PFNs are poised to play a more important role across a wide range of applications. They enable the efficient allocation of pre-training compute to low-data scenarios. Originally applied to small Bayesian modeling tasks, the field of PFNs has significantly expanded to address more complex domains and larger datasets. This position paper argues that PFNs and other amortized inference approaches represent the future of Bayesian inference, leveraging amortized learning to tackle data-scarce problems. We thus believe they are a fruitful area of research. In this position paper, we explore their potential and directions to address their current limitations.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[2014]

L. Thede, K. Roth, M. Bethge, Z. Akata and T. Hartvigsen.
WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. Preprint. arXiv

Abstract

Keeping large language models factually up-to-date is crucial for deployment, yet costly retraining remains a challenge. Knowledge editing offers a promising alternative, but methods are only tested on small-scale or synthetic edit benchmarks. In this work, we aim to bridge research into lifelong knowledge editing to real-world edits at practically relevant scale. We first introduce WikiBigEdit; a large-scale benchmark of real-world Wikidata edits, built to automatically extend lifelong for future-proof benchmarking. In its first instance, it includes over 500K question-answer pairs for knowledge editing alongside a comprehensive evaluation pipeline. Finally, we use WikiBigEdit to study existing knowledge editing techniques’ ability to incorporate large volumes of real-world facts and contrast their capabilities to generic modification techniques such as retrieval augmentation and continual finetuning to acquire a complete picture of the practical extent of current lifelong knowledge editing.

MCML Authors

Karsten Roth

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Interpretable and Reliable Machine Learning

[2013]

U. Fischer Abaigar, C. Kern and J. Perdomo.
The Value of Prediction in Identifying the Worst-Off.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. Spotlight Presentation. To be published. Preprint available. arXiv

Abstract

Machine learning is increasingly used in government programs to identify and support the most vulnerable individuals, prioritizing assistance for those at greatest risk over optimizing aggregate outcomes. This paper examines the welfare impacts of prediction in equity-driven contexts, and how they compare to other policy levers, such as expanding bureaucratic capacity. Through mathematical models and a real-world case study on long-term unemployment amongst German residents, we develop a comprehensive understanding of the relative effectiveness of prediction in surfacing the worst-off. Our findings provide clear analytical frameworks and practical, data-driven tools that empower policymakers to make principled decisions when designing these systems.

MCML Authors

Unai Fischer Abaigar

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[2012]

X. Feng, Z. Jiang, T. Kaufmann, E. Hüllermeier, P. Weng and Y. Zhu.
Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. URL

Abstract

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains with hard-to-formalize objectives. However, traditional methods based on pairwise trajectory comparisons face notable challenges, including the difficulty in comparing trajectories with subtle differences and the limitation of conveying only ordinal information, limiting direct inference of preference strength. In this paper, we introduce a novel distinguishability query, allowing humans to express preference strength by comparing two pairs of trajectories. Labelers first indicate which pair is easier to compare, then provide preference feedback only on the easier pair. Our proposed query type directly captures preference strength and is expected to reduce the cognitive load on the labeler. We further connect this query to cardinal utility and difference relations and develop an efficient query selection scheme to achieve better trade-off between query informativeness and easiness. Experimental results demonstrate the potential of our method for faster, data-efficient learning and improved user-friendliness in RLHF benchmarks, particularly in classical control settings where preference strength is critical for expected utility maximization.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[2011]

R. Sharma, S. Mukherjee, A. Šipka, E. Hüllermeier, S. Vollmer, S. Redyuk and D. A. Selby.
X-Hacking: The Threat of Misguided AutoML.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. URL

Abstract

Explainable AI (XAI) and interpretable machine learning methods help to build trust in model predictions and derived insights, yet also present a perverse incentive for analysts to manipulate XAI metrics to support pre-specified conclusions. This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as Shap values. We show how easily an automated machine learning pipeline can be adapted to exploit model multiplicity at scale: searching a set of ‘defensible’ models with similar predictive performance to find a desired explanation. We formulate the trade-off between explanation and accuracy as a multi-objective optimisation problem, and illustrate empirically on familiar real-world datasets that, on average, Bayesian optimisation accelerates X-hacking 3-fold for features susceptible to it, versus random sampling. We show the vulnerability of a dataset to X-hacking can be determined by information redundancy among features. Finally, we suggest possible methods for detection and prevention, and discuss ethical implications for the credibility and reproducibility of XAI.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Suvrit Sra

Artificial Intelligence and Machine Learning

[2010]

P. Fatemi, E. Sharifian and M. H. Yassaee.
A New Approach to Backtracking Counterfactual Explanations: A Unified Causal Framework for Efficient Model Interpretability.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

Counterfactual explanations enhance interpretability by identifying alternative inputs that produce different outputs, offering localized insights into model decisions. However, traditional methods often neglect causal relationships, leading to unrealistic examples. While newer approaches integrate causality, they are computationally expensive. To address these challenges, we propose an efficient method called BRACE based on backtracking counterfactuals that incorporates causal reasoning to generate actionable explanations. We first examine the limitations of existing methods and then introduce our novel approach and its features. We also explore the relationship between our method and previous techniques, demonstrating that it generalizes them in specific scenarios. Finally, experiments show that our method provides deeper insights into model outputs.

MCML Authors

Pouria Fatemi

Resource Aware Machine Learning

[2009]

S. Karnik, A. Veselovska, M. Iwen and F. Krahmer.
Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.

MCML Authors

Hanna Veselovska

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Felix Krahmer

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Optimization & Data Analysis

[2008]

W. Lai, A. Fraser and I. Titov.
Joint Localization and Activation Editing for Low-Resource Fine-Tuning.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, are commonly used to adapt LLMs. However, the effectiveness of standard PEFT methods is limited in low-resource scenarios with only a few hundred examples. Recent advances in interpretability research have inspired the emergence of activation editing techniques, which modify the activations of specific model components. These methods, due to their extremely small parameter counts, show promise for small datasets. However, their performance is highly dependent on identifying the correct modules to edit and often lacks stability across different datasets. In this paper, we propose Joint Localization and Activation Editing (JoLA), a method that jointly learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters themselves - the vectors applied as additive offsets or multiplicative scalings to the head output. Through evaluations on three benchmarks spanning commonsense reasoning, natural language understanding, and natural language generation, we demonstrate that JoLA consistently outperforms existing methods.

MCML Authors

Wen Lai

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Data Analytics & Statistics

[2007]

A. Modarressi, H. Deilamsalehy, F. Dernoncourt, T. Bui, R. A. Rossi, S. Yoon and H. Schütze.
NoLiMa: Long-Context Evaluation Beyond Literal Matching.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves retrieving a ’needle’ (relevant information) from a ‘haystack’ (long irrelevant context). Extensions of this approach include increasing distractors, fact chaining, and in-context reasoning. However, in these benchmarks, models can exploit existing literal matches between the needle and haystack to simplify the task. To address this, we introduce NoLiMa, a benchmark extending NIAH with a carefully designed needle set, where questions and needles have minimal lexical overlap, requiring models to infer latent associations to locate the needle within the haystack. We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts (<1K), performance degrades significantly as context length increases. At 32K, for instance, 10 models drop below 50% of their strong short-length baselines. Even GPT-4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99.3% to 69.7%. Our analysis suggests these declines stem from the increased difficulty the attention mechanism faces in longer contexts when literal matches are absent, making it harder to retrieve relevant information.

MCML Authors

Ali Modarressi

Computational Linguistics

Hinrich Schütze

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Computational Linguistics

[2006]

D. A. Nguyen, E. Araya, A. Fono and G. Kutyniok.
Time to Spike? Understanding the Representational Power of Spiking Neural Networks in Discrete Time.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Recent years have seen significant progress in developing spiking neural networks (SNNs) as a potential solution to the energy challenges posed by conventional artificial neural networks (ANNs). However, our theoretical understanding of SNNs remains relatively limited compared to the ever-growing body of literature on ANNs. In this paper, we study a discrete-time model of SNNs based on leaky integrate-and-fire (LIF) neurons, referred to as discrete-time LIF-SNNs, a widely used framework that still lacks solid theoretical foundations. We demonstrate that discrete-time LIF-SNNs with static inputs and outputs realize piecewise constant functions defined on polyhedral regions, and more importantly, we quantify the network size required to approximate continuous functions. Moreover, we investigate the impact of latency (number of time steps) and depth (number of layers) on the complexity of the input space partitioning induced by discrete-time LIF-SNNs. Our analysis highlights the importance of latency and contrasts these networks with ANNs employing piecewise linear activation functions. Finally, we present numerical experiments to support our theoretical findings.

MCML Authors

Adalbert Fono

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Foundations of Artificial Intelligence

[2005]

T. Pielok, B. Bischl and D. Rügamer.
Revisiting Unbiased Implicit Variational Inference.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Recent years have witnessed growing interest in semi-implicit variational inference (SIVI) methods due to their ability to rapidly generate samples from highly complicated distributions. However, since the likelihood of these samples is non-trivial to estimate in high dimensions, current research focuses on finding effective SIVI training routines. While unbiased implicit variational inference (UIVI) has largely been dismissed as imprecise and computationally prohibitive because of its inner MCMC loop, we revisit this method and identify key shortcomings. In particular, we show that UIVI’s MCMC loop can be effectively replaced via importance sampling and the optimal proposal distribution can be learned stably by minimizing an expected forward Kullback–Leibler divergence without bias. Our refined approach demonstrates superior performance or parity with state-of-the-art methods on established SIVI benchmarks.

MCML Authors

Tobias Pielok

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[2004]

A. Reuter, T. G. J. Rudner, V. Fortuin and D. Rügamer.
Can Transformers Learn Full Bayesian Inference in Context?
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Transformers have emerged as the dominant architecture in the field of deep learning, with a broad range of applications and remarkable in-context learning (ICL) capabilities. While not yet fully understood, ICL has already proved to be an intriguing phenomenon, allowing transformers to learn in context – without requiring further training. In this paper, we further advance the understanding of ICL by demonstrating that transformers can perform full Bayesian inference for commonly used statistical models in context. More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows which enables us to infer complex posterior distributions for methods such as generalized linear models and latent factor models. Extensive experiments on real-world datasets demonstrate that our ICL approach yields posterior samples that are similar in quality to state-of-the-art MCMC or variational inference methods not operating in context.

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

[2003]

R. Schulte, D. Rügamer and T. Nagler.
Adjustment for Confounding using Pre-Trained Representations.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

There is growing interest in extending average treatment effect (ATE) estimation to incorporate non-tabular data, such as images and text, which may act as sources of confounding. Neglecting these effects risks biased results and flawed scientific conclusions. However, incorporating non-tabular data necessitates sophisticated feature extractors, often in combination with ideas of transfer learning. In this work, we investigate how latent features from pre-trained neural networks can be leveraged to adjust for sources of confounding. We formalize conditions under which these latent features enable valid adjustment and statistical inference in ATE estimation, demonstrating results along the example of double machine learning. In this context, we also discuss critical challenges inherent to latent feature learning and downstream parameter estimation using those. As our results are agnostic to the considered data modality, they represent an important first step towards a theoretical foundation for the usage of latent representation from foundation models in ATE estimation.

MCML Authors

Rickmer Schulte

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Thomas Nagler

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computational Statistics & Data Science

[2002]

J. Schweisthal, D. Frauen, M. Schröder, K. Heß, N. Kilbertus and S. Feuerriegel.
Learning Representations of Instruments for Partial Identification of Treatment Effects.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

Reliable estimation of treatment effects from observational data is important in many disciplines such as medicine. However, estimation is challenging when unconfoundedness as a standard assumption in the causal inference literature is violated. In this work, we leverage arbitrary (potentially high-dimensional) instruments to estimate bounds on the conditional average treatment effect (CATE). Our contributions are three-fold: (1) We propose a novel approach for partial identification through a mapping of instruments to a discrete representation space so that we yield valid bounds on the CATE. This is crucial for reliable decision-making in real-world applications. (2) We derive a two-step procedure that learns tight bounds using a tailored neural partitioning of the latent instrument space. As a result, we avoid instability issues due to numerical approximations or adversarial training. Furthermore, our procedure aims to reduce the estimation variance in finite-sample settings to yield more reliable estimates. (3) We show theoretically that our procedure obtains valid bounds while reducing estimation variance. We further perform extensive experiments to demonstrate the effectiveness across various settings. Overall, our procedure offers a novel path for practitioners to make use of potentially high-dimensional instruments (e.g., as in Mendelian randomization).

MCML Authors

Jonas Schweisthal

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Maresa Schröder

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[2001]

A. Soleymani, B. Tahmasebi, S. Jegelka and P. Jaillet.
Learning with Exact Invariances in Polynomial Time.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

We study the statistical-computational trade-offs for learning with exact invariances (or symmetries) using kernel regression. Traditional methods, such as data augmentation, group averaging, canonicalization, and frame-averaging, either fail to provide a polynomial-time solution or are not applicable in the kernel setting. However, with oracle access to the geometric properties of the input space, we propose a polynomial-time algorithm that learns a classifier with emph{exact} invariances. Moreover, our approach achieves the same excess population risk (or generalization error) as the original kernel regression problem. To the best of our knowledge, this is the first polynomial-time algorithm to achieve exact (not approximate) invariances in this context. Our proof leverages tools from differential geometry, spectral theory, and optimization. A key result in our development is a new reformulation of the problem of learning under invariances as optimizing an infinite number of linearly constrained convex quadratic programs, which may be of independent interest.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[2000]

P. Spohn, L. Girrbach, J. Bader and Z. Akata.
Align-then-Unlearn: Embedding Alignment for LLM Unlearning.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

MCML Authors

Leander Girrbach

Interpretable and Reliable Machine Learning

Jessica Bader

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Stefan Bauer

Interpretable and Reliable Machine Learning

[1999]

A. Uselis, A. Dittadi and S. J. Oh.
Does Data Scaling Lead to Visual Compositional Generalization?
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL GitHub

Abstract

Compositional understanding is crucial for human intelligence, yet it remains unclear whether contemporary vision models exhibit it. The dominant machine learning paradigm is built on the premise that scaling data and model sizes will improve out-of-distribution performance, including compositional generalization. We test this premise through controlled experiments that systematically vary data scale, concept diversity, and combination coverage. We find that compositional generalization is driven by data diversity, not mere data scale. Increased combinatorial coverage forces models to discover a linearly factored representational structure, where concepts decompose into additive components. We prove this structure is key to efficiency, enabling perfect generalization from few observed combinations. Evaluating pretrained models (DINO, CLIP), we find above-random yet imperfect performance, suggesting partial presence of this structure. Our work motivates stronger emphasis on constructing diverse datasets for compositional generalization, and considering the importance of representational structure that enables efficient compositional learning.

MCML Authors

Andrea Dittadi

Dr.

Algorithmic Machine Learning & Explainable AI

[1998]

Z. Li, X. Han, Y. Li, N. Strauß and M. Schubert.
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions.
WM @ICML 2025 - Workshop on Building Physically Plausible World Models at the 42nd International Conference on Machine Learning (ICML 2025). Vancouver, Canada, Jul 13-19, 2025. To be published.

Abstract

Diffusion-based world models have demonstrated strong capabilities in synthesizing realistic long-horizon trajectories for offline reinforcement learning (RL). However, many existing methods do not directly generate actions alongside states and rewards, limiting their compatibility with standard value-based offline RL algorithms that rely on one-step temporal difference (TD) learning. While prior work has explored joint modeling of states, rewards, and actions to address this issue, such formulations often lead to increased training complexity and reduced performance in practice. Therefore, in this paper, we propose a diffusion-based world model that generates state-reward trajectories conditioned on the current state, action, and return-to-go value, and efficiently infers missing actions via an inverse dynamics model (IDM). This modular design produces complete synthetic transitions suitable for one-step TD-based offline RL, enabling effective and computationally efficient training. Empirically, we show that conservative offline RL algorithms such as TD3BC and IQL benefit significantly from training on these augmented trajectories, consistently outperforming prior diffusion-based baselines across multiple tasks in the D4RL benchmark.

MCML Authors

Zongyue Li

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Niklas Strauß

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[1997]

C. Pellegrini, E. Özsoy, B. Busam, B. Wiestler, N. Navab and M. Keicher.
RaDialog: Large Vision-Language Models for X-Ray Reporting and Dialog-Driven Assistance.
MIDL 2025 - Medical Imaging with Deep Learning. Salt Lake City, UT, USA, Jul 09-11, 2025. URL GitHub

Abstract

Conversational AI tools for generating and discussing accurate radiology reports could transform radiology by enabling collaborative, human-in-the-loop diagnostic processes, saving time and enhancing report quality. While, to this end, Large Vision-Language Models hold promise, current methods lack clinical correctness or are single-task models without conversational abilities. We propose a novel architecture and dataset to address these limitations. First, we propose a secondary image branch, explicitly focusing on structured clinical findings, improving the clinical correctness score by 13.3%. Second, we propose a catastrophic forgetting mitigation strategy and instruct dataset with variable dialog-based tasks, to enable our model to handle a multitude of different queries. RaDialog marks a foundational step toward clinical dialog systems, outperforming existing medical LVLMs by 15.0% in clinical correctness in report generation, 23.4% in interactive report correction, and is preferred by radiologists in 84.0% of cases over a comparative method.

MCML Authors

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1996]

L. A. Heidrich, A. Rastogi, P. Upadhya, G. Brugnara, M. Foltyn-Dumitru, B. Wiestler and P. Vollmuth.
Curriculum Learning for Language-guided, Multi-modal Detection of Various Pathologies.
MIDL 2025 - Medical Imaging with Deep Learning. Salt Lake City, UT, USA, Jul 09-11, 2025. To be published. Preprint available. URL

Abstract

Pathology detection in medical imaging is crucial for radiologists, yet current approaches that train specialized models for each region of interest often lack efficiency and robustness. Furthermore, the scarcity of annotated medical data, particularly for diverse phenotypes, poses significant challenges in achieving generalizability. To address these challenges, we present a novel language-guided object detection pipeline for medical imaging that leverages curriculum learning strategies, chosen for their ability to progressively train models on increasingly complex samples, thereby improving generalization across pathologies, phenotypes, and modalities. We developed a unified pipeline to convert segmentation datasets into bounding box annotations, and applied two curriculum learning approaches - teacher curriculum and bounding box size curriculum - to train a Grounding DINO model. Our method was evaluated on different tumor types in MRI and CT scans and showed significant improvements in detection accuracy. The teacher and bounding box size curriculum learning approaches yielded a 4.9% AP and 5.2% AP increase over baseline, respectively. The results highlight the potential of curriculum learning to optimize medical image analysis and clinical workflow by providing a versatile and efficient detection algorithm.

MCML Authors

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

[1995]

V. M. Singh, A. G. V. Asiares, L. S. Schuhmacher, K. Rendall, S. Weißbrod, D. Rügamer and I. Körte.
An Interpretable Representation Learning Approach for Diffusion Tensor Imaging.
MIDL 2025 - Medical Imaging with Deep Learning. Salt Lake City, UT, USA, Jul 09-11, 2025. To be published. Preprint available. arXiv

Abstract

Diffusion Tensor Imaging (DTI) tractography offers detailed insights into the structural connectivity of the brain, but presents challenges in effective representation and interpretation in deep learning models. In this work, we propose a novel 2D representation of DTI tractography that encodes tract-level fractional anisotropy (FA) values into a 9x9 grayscale image. This representation is processed through a Beta-Total Correlation Variational Autoencoder with a Spatial Broadcast Decoder to learn a disentangled and interpretable latent embedding. We evaluate the quality of this embedding using supervised and unsupervised representation learning strategies, including auxiliary classification, triplet loss, and SimCLR-based contrastive learning. Compared to the 1D Group deep neural network (DNN) baselines, our approach improves the F1 score in a downstream sex classification task by 15.74% and shows a better disentanglement than the 3D representation.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1994]

P. Kolpaczki, T. Nielen and E. Hüllermeier.
Antithetic Sampling for Top-k Shapley Identification.
xAI 2025 - 3rd World Conference on Explainable Artificial Intelligence. Istanbul, Turkey, Jul 09-11, 2025. Preprint. arXiv

Abstract

Additive feature explanations rely primarily on game-theoretic notions such as the Shapley value by viewing features as cooperating players. The Shapley value’s popularity in and outside of explainable AI stems from its axiomatic uniqueness. However, its computational complexity severely limits practicability. Most works investigate the uniform approximation of all features’ Shapley values, needlessly consuming samples for insignificant features. In contrast, identifying the k most important features can already be sufficiently insightful and yields the potential to leverage algorithmic opportunities connected to the field of multi-armed bandits. We propose Comparable Marginal Contributions Sampling (CMCS), a method for the top-k identification problem utilizing a new sampling scheme taking advantage of correlated observations. We conduct experiments to showcase the efficacy of our method in compared to competitive baselines. Our empirical findings reveal that estimation quality for the approximate-all problem does not necessarily transfer to top-k identification and vice versa.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1993]

P. Knab, S. Marton, U. Schlegel and C. Bartelt.
Which LIME should I trust? Concepts, Challenges, and Solutions.
xAI 2025 - 3rd World Conference on Explainable Artificial Intelligence. Istanbul, Turkey, Jul 09-11, 2025. To be published. Preprint available. arXiv GitHub

Abstract

As neural networks become dominant in essential systems, Explainable Artificial Intelligence (XAI) plays a crucial role in fostering trust and detecting potential misbehavior of opaque models. LIME (Local Interpretable Model-agnostic Explanations) is among the most prominent model-agnostic approaches, generating explanations by approximating the behavior of black-box models around specific instances. Despite its popularity, LIME faces challenges related to fidelity, stability, and applicability to domain-specific problems. Numerous adaptations and enhancements have been proposed to address these issues, but the growing number of developments can be overwhelming, complicating efforts to navigate LIME-related research. To the best of our knowledge, this is the first survey to comprehensively explore and collect LIME’s foundational concepts and known limitations. We categorize and compare its various enhancements, offering a structured taxonomy based on intermediate steps and key issues. Our analysis provides a holistic overview of advancements in LIME, guiding future research and helping practitioners identify suitable approaches. Additionally, we provide a continuously updated interactive website (this https URL), offering a concise and accessible overview of the survey.

MCML Authors

Udo Schlegel

Database Systems and Data Mining

[1992]

Y. Li, M. Ghahremani and C. Wachinger.
MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis.
ICVSS 2025 - International Computer Vision Summer School: Computer Vision for Spatial Intelligence. Sicily, Italy, Jul 06-12, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Recent vision-language foundation models deliver state-of-the-art results on natural image classification but falter on medical images due to pronounced domain shifts. At the same time, training a medical foundation model requires substantial resources, including extensive annotated data and high computational capacity. To bridge this gap with minimal overhead, we introduce MedBridge, a lightweight multimodal adaptation framework that re-purposes pretrained VLMs for accurate medical image diagnosis. MedBridge comprises three key components. First, a Focal Sampling module that extracts high-resolution local regions to capture subtle pathological features and compensate for the limited input resolution of general-purpose VLMs. Second, a Query Encoder (QEncoder) injects a small set of learnable queries that attend to the frozen feature maps of VLM, aligning them with medical semantics without retraining the entire backbone. Third, a Mixture of Experts mechanism, driven by learnable queries, harnesses the complementary strength of diverse VLMs to maximize diagnostic performance. We evaluate MedBridge on five medical imaging benchmarks across three key adaptation tasks, demonstrating its superior performance in both cross-domain and in-domain adaptation settings, even under varying levels of training data availability. Notably, MedBridge achieved over 6-15% improvement in AUC compared to state-of-the-art VLM adaptation methods in multi-label thoracic disease diagnosis, underscoring its effectiveness in leveraging foundation models for accurate and data-efficient medical diagnosis.

MCML Authors

Yitong Li

Artificial Intelligence in Medical Imaging

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1991]

W. Li, W. Chen, S. Qian, J. Chen, D. Cremers and H. Li.
DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair.
To be published. Preprint available (Jul 06-12, 2025). arXiv GitHub

Abstract

Recent advances in 3D Gaussian Splatting have shown promising results. Existing methods typically assume static scenes and/or multiple images with prior poses. Dynamics, sparse views, and unknown poses significantly increase the problem complexity due to insufficient geometric constraints. To overcome this challenge, we propose a method that can use only two images without prior poses to fit Gaussians in dynamic environments. To achieve this, we introduce two technical contributions. First, we propose an object-level two-view bundle adjustment. This strategy decomposes dynamic scenes into piece-wise rigid components, and jointly estimates the camera pose and motions of dynamic objects. Second, we design an SE(3) field-driven Gaussian training method. It enables fine-grained motion modeling through learnable per-Gaussian transformations. Our method leads to high-fidelity novel view synthesis of dynamic scenes while accurately preserving temporal consistency and object motion. Experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art approaches designed for the cases of static environments, multiple images, and/or known poses.

MCML Authors

Weihang Li

Computer Aided Medical Procedures & Augmented Reality

Weirong Chen

Computer Vision & Artificial Intelligence

Shenhan Qian

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Haoang Li

Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

* Former Member

[1990]

D. Geissler, A. Maarouf, D. Bär, N. Pröllochs and S. Feuerriegel.
A comment on 'A 2 million-person, campaign-wide field experiment shows how digital advertising affects voter turnout'.
I4R Discussion Paper Series.237 (Jul. 2025). URL

Abstract

Aggarwal et al. (2023) analyze the effects of an 8-month-long advertising program on voter turnout in the 2020 US presidential election. Therein, 2 million voters were exposed to pro-Biden and anti-Trump advertisements on social media in five battleground states. The study finds no average treatment effect on voter turnout but differential effects when modeling by Trump support: Biden supporters are 0.4 percentage points more likely to vote while Trump supporters are 0.3 percentage points less likely to vote (t = −2.09 with p-value < 0.05). We conduct a direct reproduction of the paper by using their data and code. In addition, we check that their claims are robust to new analyses for understanding heterogeneity through the use of the causal forest methodology. We confirm the sign, magnitude, and statistical significance of the point estimates for the new analyses for understanding heterogeneity. The only significant discrepancy in results is that we find greater and statistically significant effects for the ATE, nearly all CATEs (age 18-39, gender, race, vote margin, partisanship (except Democrats), and Trump support score), and the differential effects of the Trump support score using a causal forest. These differences are likely due to the use of the causal forest and do not question the validity of the findings of the original paper.

MCML Authors

Dominique Geissler

Artificial Intelligence in Management

Abdurahman Maarouf

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dominik Bär

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Artificial Intelligence in Management

[1989]

U. Fischer Abaigar, C. Kern and F. Kreuter.
Adjusting survey estimates with multi-accuracy post-processing.
ITACOSM 2025 - Italian Conference on Survey Methodology. Bologna, Italy, Jul 01-04, 2025. Invited talk. To be published. Preprint available.

Abstract

With the rise of non-probability samples and new data sources, survey researchers face growing challenges related to selection bias. One emerging line of work adapts algorithmic tools from machine learning to improve robustness in such settings. This talk introduces multi-accuracy boosting (Kim et al., 2019), a post-processing method that reduces subgroup-level prediction error. Originally developed in the context of fairness, it has since been explored for use in survey adjustment tasks (Kim & Kern et al., 2022). I offer an accessible overview of the method and share reflections on its potential, and open questions for future research.

MCML Authors

Unai Fischer Abaigar

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[1988]

M. Herold, J. S. Jehle, F. Krahmer and A. Veselovska.
Non-intrusive surrogate modelling using sparse random features with applications in crashworthiness analysis.
International Journal for Uncertainty Quantification 15.4 (Jul. 2025).

Abstract

Efficient surrogate modelling is a key requirement for uncertainty quantification in data-driven scenarios. In this work, a novel approach of using Sparse Random Features for surrogate modelling in combination with self-supervised dimensionality reduction is described. The method is compared to other methods on synthetic and real data obtained from crashworthiness analyses. The results show a superiority of the here described approach over state of the art surrogate modelling techniques, Polynomial Chaos Expansions and Neural Networks.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

Hanna Veselovska

Dr.

Applied Numerical Analysis

[1987]

M. Keinert, S. Pistrosch, A. Mallol-Ragolta, B. W. Schuller and M. Berking.
Facial Emotion Recognition of 16 Distinct Emotions From Smartphone Videos: Comparative Study of Machine Learning and Human Performance.
Journal of Medical Internet Research 27 (Jul. 2025). DOI

Abstract

Background: The development of automatic emotion recognition models from smartphone videos is a crucial step toward the dissemination of psychotherapeutic app interventions that encourage emotional expressions. Existing models focus mainly on the 6 basic emotions while neglecting other therapeutically relevant emotions. To support this research, we introduce the novel Stress Reduction Training Through the Recognition of Emotions Wizard-of-Oz (STREs WoZ) dataset, which contains facial videos of 16 distinct, therapeutically relevant emotions.
Objective: This study aimed to develop deep learning–based automatic facial emotion recognition (FER) models for binary (positive vs negative) and multiclass emotion classification tasks, assess the models’ performance, and validate them by comparing the models with human observers.
Methods: The STREs WoZ dataset contains 14,412 facial videos of 63 individuals displaying the 16 emotions. The selfie-style videos were recorded during a stress reduction training using front-facing smartphone cameras in a nonconstrained laboratory setting. Automatic FER models using both appearance and deep-learned features for binary and multiclass emotion classification were trained on the STREs WoZ dataset. The appearance features were based on the Facial Action Coding System and extracted with OpenFace. The deep-learned features were obtained through a ResNet50 model. For our deep learning models, we used the appearance features, the deep-learned features, and their concatenation as inputs. We used 3 recurrent neural network (RNN)–based architectures: RNN-convolution, RNN-attention, and RNN-average networks. For validation, 3 human observers were also trained in binary and multiclass emotion recognition. A test set of 3018 facial emotion videos of the 16 emotions was completed by both the automatic FER model and human observers. The performance was assessed with unweighted average recall (UAR) and accuracy.
Results: Models using appearance features outperformed those using deep-learned features, as well as models combining both feature types in both tasks, with the attention network using appearance features emerging as the best-performing model. The attention network achieved a UAR of 92.9% in the binary classification task, and accuracy values ranged from 59.0% to 90.0% in the multiclass classification task. Human performance was comparable to that of the automatic FER model in the binary classification task, with a UAR of 91.0%, and superior in the multiclass classification task, with accuracy values ranging from 87.4% to 99.8%.
Conclusions: Future studies are needed to enhance the performance of automatic FER models for practical use in psychotherapeutic apps. Nevertheless, this study represents an important first step toward advancing emotion-focused psychotherapeutic interventions via smartphone apps.

MCML Authors

Simon Pistrosch

Health Informatics

Adria Mallol-Ragolta

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1986]

B. Bischl, G. Casalicchio, T. Das, M. Feurer, S. Fischer, P. Gijsbers, S. Mukherjee, A. C. Müller, L. Németh, L. Oala, L. Purucker, S. Ravi, J. N. van Rijn, P. Singh, J. Vanschoren, J. van der Velde and M. Wever.
OpenML: Insights from 10 years and more than a thousand papers.
Patterns In Press, Corrected Proof (Jul. 2025). DOI

Abstract

OpenML is an open-source platform that democratizes machine-learning evaluation by enabling anyone to share datasets in uniform standards, define precise machine-learning tasks, and automatically share detailed workflows and model evaluations. More than just a platform, OpenML fosters a collaborative ecosystem where scientists create new tools, launch initiatives, and establish standards to advance machine learning. Over the past decade, OpenML has inspired over 1,500 publications across diverse fields, from scientists releasing new datasets and benchmarking new models to educators teaching reproducible science. Looking back, we detail and describe the platform’s impact by looking at usage and citations. We share lessons from a decade of building, maintaining, and expanding OpenML, highlighting how rich metadata, collaborative benchmarking, and open interfaces have enhanced research and interoperability. Looking ahead, we cover ongoing efforts to expand OpenML’s capabilities and integrate with other platforms, informing a broader vision for open-science infrastructure for machine learning.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Sebastian Fischer

Statistical Learning and Data Science

[1985]

A. F. Dima, S. Shit, H. Qiu, R. Holland, T. T. Mueller, F. A. Musio, K. Yang, B. Menze, R. Braren, M. Makowski and D. Rückert.
Parametric shape models for vessels learned from segmentations via differentiable voxelization.
Preprint (Jul. 2025). arXiv

Abstract

Vessels are complex structures in the body that have been studied extensively in multiple representations. While voxelization is the most common of them, meshes and parametric models are critical in various applications due to their desirable properties. However, these representations are typically extracted through segmentations and used disjointly from each other. We propose a framework that joins the three representations under differentiable transformations. By leveraging differentiable voxelization, we automatically extract a parametric shape model of the vessels through shape-to-segmentation fitting, where we learn shape parameters from segmentations without the explicit need for ground-truth shape parameters. The vessel is parametrized as centerlines and radii using cubic B-splines, ensuring smoothness and continuity by construction. Meshes are differentiably extracted from the learned shape parameters, resulting in high-fidelity meshes that can be manipulated post-fit. Our method can accurately capture the geometry of complex vessels, as demonstrated by the volumetric fits in experiments on aortas, aneurysms, and brain vessels.

MCML Authors

Daniel Rückert

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence in Healthcare and Medicine

[1984]

C. Gruber, H. Alber, B. Bischl, G. Kauermann, B. Plank and M. Aßenmacher.
Revisiting Active Learning under (Human) Label Variation.
Preprint (Jul. 2025). arXiv

Abstract

Access to high-quality labeled data remains a limiting factor in applied supervised learning. While label variation (LV), i.e., differing labels for the same instance, is common, especially in natural language processing, annotation frameworks often still rest on the assumption of a single ground truth. This overlooks human label variation (HLV), the occurrence of plausible differences in annotations, as an informative signal. Similarly, active learning (AL), a popular approach to optimizing the use of limited annotation budgets in training ML models, often relies on at least one of several simplifying assumptions, which rarely hold in practice when acknowledging HLV. In this paper, we examine foundational assumptions about truth and label nature, highlighting the need to decompose observed LV into signal (e.g., HLV) and noise (e.g., annotation error). We survey how the AL and (H)LV communities have addressed – or neglected – these distinctions and propose a conceptual framework for incorporating HLV throughout the AL loop, including instance selection, annotator choice, and label representation. We further discuss the integration of large language models (LLM) as annotators. Our work aims to lay a conceptual foundation for HLV-aware active learning, better reflecting the complexities of real-world annotation.

MCML Authors

Helen Alber

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

Barbara Plank

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

AI and Computational Linguistics

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[1983]

S. Haas and E. Hüllermeier.
Aleatoric and Epistemic Uncertainty Measures for Ordinal Classification through Binary Reduction.
Preprint (Jul. 2025). arXiv

Abstract

Ordinal classification problems, where labels exhibit a natural order, are prevalent in high-stakes fields such as medicine and finance. Accurate uncertainty quantification, including the decomposition into aleatoric (inherent variability) and epistemic (lack of knowledge) components, is crucial for reliable decision-making. However, existing research has primarily focused on nominal classification and regression. In this paper, we introduce a novel class of measures of aleatoric and epistemic uncertainty in ordinal classification, which is based on a suitable reduction to (entropy- and variance-based) measures for the binary case. These measures effectively capture the trade-off in ordinal classification between exact hit-rate and minimial error distances. We demonstrate the effectiveness of our approach on various tabular ordinal benchmark datasets using ensembles of gradient-boosted trees and multi-layer perceptrons for approximate Bayesian inference. Our method significantly outperforms standard and label-wise entropy and variance-based measures in error detection, as indicated by misclassification rates and mean absolute error. Additionally, the ordinal measures show competitive performance in out-of-distribution (OOD) detection. Our findings highlight the importance of considering the ordinal nature of classification problems when assessing uncertainty.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1982]

J. Li and G. Kutyniok.
Expressivity of deep neural networks.
Preprint (Jul. 2025). PDF

Abstract

This chapter focuses on the approximation theory of deep ReLU neural networks, analyzing their ability to approximate various target functions with different network architectures. We begin by introducing the universal approximation theory of deep neural networks, stating that given enough neurons, neural networks can approximate general functions. We then delve into the fundamental properties of ReLU neural networks and explore the role of width and depth of neural networks, highlighting that increasing layers could be more effective than increasing width in improving approximation accuracy. Next, we discuss the approximation rates for Sobolev functions using fully connected and convolutional neural networks. To alleviate the curse of dimensionality, we further consider Korobov functions. Finally, we focus on the approximation properties of self-attention and transformers, which have become increasingly important in modern deep learning. These results shed light on the expressivity and reliability of deep learning models, providing valuable insights into networks’ behavior and performance.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Foundations of Artificial Intelligence

[1981]

T. Meier and K. Khutsishvili.
Who Owns the Future? Ways to Understand Power, Technology, and the Moral Commons.
Preprint (Jul. 2025). URL

Abstract

The ascent of tech billionaires—and, depending on the market, soon trillionaires—signals more than a shift in global economic structures; it marks a transformation in the moral and cultural conditions under which democratic life is sustained. This contribution offers a communitarian critique of Big Tech’s influence, grounded in the philosophical frameworks of Charles Taylor, Michael Sandel, and virtue ethicist Shannon Vallor, and further supported by public goods theory and economic insights from Paul Samuelson and Joseph Stiglitz, with Elinor Ostrom’s work emphasizing the civic importance of collective stewardship. It contends that the challenge to democracy posed by concentrated digital power is not merely institutional, economic, or ethical, but a disruption of the very conditions for democratic citizenship.

MCML Authors

Thomas Meier

Dr.

[1980]

L. Bothmann, P. A. Boustani, J. M. Alvarez, G. Casalicchio, B. Bischl and S. Dandl.
Privilege Scores.
EWAF 2025 - 4th European Workshop on Algorithmic Fairness. Eindhoven, The Netherlands, Jun 30-Jul 02, 2025. To be published. Preprint available. arXiv

Abstract

Bias-transforming methods of fairness-aware machine learning aim to correct a non-neutral status quo with respect to a protected attribute (PA). Current methods, however, lack an explicit formulation of what drives non-neutrality. We introduce privilege scores (PS) to measure PA-related privilege by comparing the model predictions in the real world with those in a fair world in which the influence of the PA is removed. At the individual level, PS can identify individuals who qualify for affirmative action; at the global level, PS can inform bias-transforming policies. After presenting estimation methods for PS, we propose privilege score contributions (PSCs), an interpretation method that attributes the origin of privilege to mediating features and direct effects. We provide confidence intervals for both PS and PSCs. Experiments on simulated and real-world data demonstrate the broad applicability of our methods and provide novel insights into gender and racial privilege in mortgage and college admissions applications.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Philip Amir Boustani

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1979]

L. Bothmann, K. Peters and B. Bischl.
What Is Fairness? On the Role of Protected Attributes and Fictitious Worlds.
EWAF 2025 - 4th European Workshop on Algorithmic Fairness. Eindhoven, The Netherlands, Jun 30-Jul 02, 2025. To be published. Preprint available. arXiv

Abstract

A growing body of literature in fairness-aware machine learning (fairML) aims to mitigate machine learning (ML)-related unfairness in automated decision-making (ADM) by defining metrics that measure fairness of an ML model and by proposing methods to ensure that trained ML models achieve low scores on these metrics. However, the underlying concept of fairness, i.e., the question of what fairness is, is rarely discussed, leaving a significant gap between centuries of philosophical discussion and the recent adoption of the concept in the ML community. In this work, we try to bridge this gap by formalizing a consistent concept of fairness and by translating the philosophical considerations into a formal framework for the training and evaluation of ML models in ADM systems. We argue that fairness problems can arise even without the presence of protected attributes (PAs), and point out that fairness and predictive performance are not irreconcilable opposites, but that the latter is necessary to achieve the former. Furthermore, we argue why and how causal considerations are necessary when assessing fairness in the presence of PAs by proposing a fictitious, normatively desired (FiND) world in which PAs have no causal effects. In practice, this FiND world must be approximated by a warped world in which the causal effects of the PAs are removed from the real-world data. Finally, we achieve greater linguistic clarity in the discussion of fairML. We outline algorithms for practical applications and present illustrative experiments on COMPAS data.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1978]

C. Leininger, S. Rittel and L. Bothmann.
Overcoming Fairness Trade-offs via Pre-processing: A Causal Perspective.
EWAF 2025 - 4th European Workshop on Algorithmic Fairness. Eindhoven, The Netherlands, Jun 30-Jul 02, 2025. To be published. Preprint available. arXiv

Abstract

Training machine learning models for fair decisions faces two key challenges: The fairness-accuracy trade-off results from enforcing fairness which weakens its predictive performance in contrast to an unconstrained model. The incompatibility of different fairness metrics poses another trade-off – also known as the impossibility theorem. Recent work identifies the bias within the observed data as a possible root cause and shows that fairness and predictive performance are in fact in accord when predictive performance is measured on unbiased data. We offer a causal explanation for these findings using the framework of the FiND (fictitious and normatively desired) world, a ‘fair’ world, where protected attributes have no causal effects on the target variable. We show theoretically that (i) classical fairness metrics deemed to be incompatible are naturally satisfied in the FiND world, while (ii) fairness aligns with high predictive performance. We extend our analysis by suggesting how one can benefit from these theoretical insights in practice, using causal pre-processing methods that approximate the FiND world. Additionally, we propose a method for evaluating the approximation of the FiND world via pre-processing in practical use cases where we do not have access to the FiND world. In simulations and empirical studies, we demonstrate that these pre-processing methods are successful in approximating the FiND world and resolve both trade-offs. Our results provide actionable solutions for practitioners to achieve fairness and high predictive performance simultaneously.

MCML Authors

Simon Rittel

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1977]

S. Yuan, E. Nie, B. Ma and M. Färber.
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers.
IJCNN 2025 - International Joint Conference on Neural Networks. Rome, Italy, Jun 30-Jul 05, 2025. Preprint. arXiv

Abstract

Large Language Models (LLMs) possess outstanding capabilities in addressing various natural language processing (NLP) tasks. However, the sheer size of these models poses challenges in terms of storage, training and inference due to the inclusion of billions of parameters through layer stacking. While traditional approaches such as model pruning or distillation offer ways for reducing model size, they often come at the expense of performance retention. In our investigation, we systematically explore the approach of reducing the number of layers in LLMs. Surprisingly, we observe that even with fewer layers, LLMs maintain similar or better performance levels, particularly in prompt-based fine-tuning for text classification tasks. Remarkably, in certain cases, models with a single layer outperform their fully layered counterparts. These findings offer valuable insights for future work aimed at mitigating the size constraints of LLMs while preserving their performance, thereby opening avenues for significantly more efficient use of LLMs.

MCML Authors

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[1976]

V. Ehm, N. El Amrani, Y. Xie, L. Bastian, M. Gao, W. Wang, L. Sang, D. Cao, Z. Lähner, D. Cremers and F. Bernard.
Beyond Complete Shapes: A Quantitative Evaluation of 3D Shape Matching Algorithms.
SGP 2025 - Symposium on Geometry Processing. Bilbao, Spain, Jun 30-Jul 04, 2025. To be published. Preprint available. arXiv

Abstract

Finding correspondences between 3D shapes is an important and long-standing problem in computer vision, graphics and beyond. While approaches based on machine learning dominate modern 3D shape matching, almost all existing (learning-based) methods require that at least one of the involved shapes is complete. In contrast, the most challenging and arguably most practically relevant setting of matching partially observed shapes, is currently underexplored. One important factor is that existing datasets contain only a small number of shapes (typically below 100), which are unable to serve data-hungry machine learning approaches, particularly in the unsupervised regime. In addition, the type of partiality present in existing datasets is often artificial and far from realistic. To address these limitations and to encourage research on these relevant settings, we provide a generic and flexible framework for the procedural generation of challenging partial shape matching scenarios. Our framework allows for a virtually infinite generation of partial shape matching instances from a finite set of shapes with complete geometry. Further, we manually create cross-dataset correspondences between seven existing (complete geometry) shape matching datasets, leading to a total of 2543 shapes. Based on this, we propose several challenging partial benchmark settings, for which we evaluate respective state-of-the-art methods as baselines.

MCML Authors

Viktoria Ehm

Computer Vision & Artificial Intelligence

Lennart Bastian

Computer Aided Medical Procedures & Augmented Reality

Maolin Gao

Computer Vision & Artificial Intelligence

Lu Sang

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1975]

C. Strasser Ceballos and C. Kern.
Location matching on shaky grounds: Re-evaluating algorithms for refugee allocation.
ACM FAccT 2025 - 8th ACM Conference on Fairness, Accountability, and Transparency. Athens, Greece, Jun 23-26, 2025. DOI

Abstract

The initial location to which refugees are assigned upon arrival in a host country plays a key role in their integration. Several research groups have developed tools to optimize refugee-location matching, with the overall aim of improving refugees’ integration outcomes. Four primary tools are already being piloted across various countries: GeoMatch, Annie™ Moore, Match’In, and Re:Match. The first two tools combine supervised machine learning with optimal matching techniques, while the latter two rely on heuristic methods to match refugee preferences with suitable locations. These tools are used in a highly sensitive context and directly impact human lives. It is, therefore, not only desirable but critical to (re-)evaluate them through the lens of algorithmic fairness. We contribute in three key aspects: First, we provide a comprehensive overview and systematization of the tools aimed at the algorithmic fairness community. Second, we identify sources of biases along the tool design stages that can contribute to disparate impacts downstream. Finally, we simulate the application of the GeoMatch tool using German survey data to empirically illustrate the impact of target variable choice on matching outcomes. While GeoMatch optimizes economic integration, we demonstrate that the integration gains differ substantially when social integration is prioritized instead. With our use case, we highlight the susceptibility of algorithmic matching tools to design decisions such as the operationalization of the integration outcome and emphasize the need for more holistic evaluations of their social impacts.

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[1974]

A. Mallol-Ragolta, M. Gonzalez-Machorro, R. von Heynitz, K. Scherzer, I. Cordts and B. W. Schuller.
Early Detection of ALS in Absence of Speech Impairments with Computer Audition.
AIME 2025 - 23rd International Conference on Artificial Intelligence in Medicine. Pavia, Italy, Jun 23-26, 2025. DOI

Abstract

We investigate whether Amyotrophic Lateral Sclerosis (ALS) can be detected in patients without speech impairments utilising computer audition techniques. We exploit the information embedded in the patients’ speech while performing five different speech tasks. Specifically, producing the sustained vowel /a:/, repeating the syllables /da/-/da/ and /da/-/ba/ (separately), reading a text passage, and describing a picture. The implemented models are task-dedicated, as they are solely trained and assessed with the speech samples of the corresponding task. We conduct our experiments on the novel, German-speaking AIMnd dataset. We define the Unweighted Average Recall (UAR) as the evaluation metric. When differentiating ALS patients with normal speech from controls – binary classification –, the best models, which obtain a UAR score of 88% on the Test set, mostly exploit the speech samples corresponding to the /da/-/ba/ task. When including the ALS patients with, at least, detectable speech disturbances in the detection – three-class classification –, the best model on the Test set scores a UAR of 70%, also exploiting the speech samples corresponding to the /da/-/ba/ task.

MCML Authors

Adria Mallol-Ragolta

Health Informatics

Björn Schuller

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Health Informatics

[1973]

F. Ghorbanpour, T. Z. Malaguth and A. Akbaritabar.
Differentiating Emigration from Return Migration of Scholars Using Name-Based Nationality Detection Models.
ICWSM 2025 - 19th International AAAI Conference on Web and Social Media. Copenhagen, Denmark, Jun 23-26, 2025. DOI

Abstract

Most web and digital trace data do not include information about an individual’s nationality due to privacy concerns. The lack of data on nationality can create challenges for migration research. It can lead to a left-censoring issue since we are uncertain about the migrant’s country of origin. Once we observe an emigration event, if we know the nationality, we can differentiate it from return migration. We propose methods to detect the nationality with the least available data, i.e., full names. We use the detected nationality in comparison with the country of academic origin, which is a common approach in studying the migration of researchers. We gathered 2.6 million unique name-nationality pairs from Wikipedia and categorized them into families of nationalities with three granularity levels to use as our training data. Using a character-based machine learning model, we achieved a weighted F1 score of 84% for the broadest- and 67%, for the most granular, country-level categorization. In our empirical study, we used the trained and tested model to assign nationality to 8+ million scholars’ full names in Scopus data. Our results show that using the country of first publication as a proxy for nationality underestimates the size of return flows, especially for countries with a more diverse academic workforce, such as the USA, Australia, and Canada. We found that around 48% of emigration from the USA was return migration once we used the country of name origin in contrast to 33% based on academic origin. In the most recent period, 79% of scholars whose affiliation has consistently changed from the USA to China, and are considered emigrants, have Chinese names in contrast to 41% with a Chinese academic origin. Our proposed methods in addressing left-censoring issues are beneficial for other research that uses digital trace data to study migration.

MCML Authors

Faeze Ghorbanpour

Data Analytics & Statistics

[1972]

I. Tsangko, A. Triantafyllopoulos, E. Kyriakidis, G. Margetis and B. W. Schuller.
Large Language Models for the Analysis of Project Proposals.
AI-HCI 2025 - 6th International Conference on Artificial Intelligence in Human Computer Interaction. Gothenburg, Sweden, Jun 22-27, 2025. DOI

Abstract

We introduce a framework that integrates traditional topic modeling methods-Latent Dirichlet Allocation (LDA) and BERTopic- with Large Language Models (LLMs) to automatically identify topics featured in project proposals for the cultural heritage (CH) domain. Applied to a dataset of 1, 757 English project proposals aimed at protecting and promoting CH in Africa, our approach begins by extracting initial topics using LDA and BERTopic. These topics are further refined by LLaMA3, generating precise and semantically meaningful categories that incorporate domain expert-curated labels to ensure contextual relevance. The consistency of assigned labels is evaluated using automatic classification. Additionally, we explore the role of linguistic features, such as sentence complexity, sentiment analysis, and gendered language, as predictors of proposal success. Results highlight the potential of combining traditional topic modeling with LLMs to uncover hidden insights into funding allocation patterns, aiming to enhance the equitable distribution of resources in CH projects.

MCML Authors

Iosif Tsangko

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1971]

O. Dhaouadi, J. Meier, J. Kaiser and D. Cremers.
Shape Your Ground: Refining Road Surfaces Beyond Planar Representations.
IV 2025 - 36th IEEE Intelligent Vehicles Symposium. Napoca, Romania, Jun 22-25, 2025. To be published. Preprint available. arXiv

Abstract

Road surface reconstruction from aerial images is fundamental for autonomous driving, urban planning, and virtual simulation, where smoothness, compactness, and accuracy are critical quality factors. Existing reconstruction methods often produce artifacts and inconsistencies that limit usability, while downstream tasks have a tendency to represent roads as planes for simplicity but at the cost of accuracy. We introduce FlexRoad, the first framework to directly address road surface smoothing by fitting Non-Uniform Rational B-Splines (NURBS) surfaces to 3D road points obtained from photogrammetric reconstructions or geodata providers. Our method at its core utilizes the Elevation-Constrained Spatial Road Clustering (ECSRC) algorithm for robust anomaly correction, significantly reducing surface roughness and fitting errors. To facilitate quantitative comparison between road surface reconstruction methods, we present GeoRoad Dataset (GeRoD), a diverse collection of road surface and terrain profiles derived from openly accessible geodata. Experiments on GeRoD and the photogrammetry-based DeepScenario Open 3D Dataset (DSC3D) demonstrate that FlexRoad considerably surpasses commonly used road surface representations across various metrics while being insensitive to various input sources, terrains, and noise types. By performing ablation studies, we identify the key role of each component towards high-quality reconstruction performance, making FlexRoad a generic method for realistic road surface modeling.

MCML Authors

Johannes Meier

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1970]

O. Dhaouadi, J. Meier, L. Wahl, J. Kaiser, L. Scalerandi, N. Wandelburg, Z. Zhou, N. Berinpanathan, H. Banzhaf and D. Cremers.
Highly Accurate and Diverse Traffic Data: The DeepScenario Open 3D Dataset.
IV 2025 - 36th IEEE Intelligent Vehicles Symposium. Napoca, Romania, Jun 22-25, 2025. To be published. Preprint available. arXiv

Abstract

Accurate 3D trajectory data is crucial for advancing autonomous driving. Yet, traditional datasets are usually captured by fixed sensors mounted on a car and are susceptible to occlusion. Additionally, such an approach can precisely reconstruct the dynamic environment in the close vicinity of the measurement vehicle only, while neglecting objects that are further away. In this paper, we introduce the DeepScenario Open 3D Dataset (DSC3D), a high-quality, occlusion-free dataset of 6 degrees of freedom bounding box trajectories acquired through a novel monocular camera drone tracking pipeline. Our dataset includes more than 175,000 trajectories of 14 types of traffic participants and significantly exceeds existing datasets in terms of diversity and scale, containing many unprecedented scenarios such as complex vehicle-pedestrian interaction on highly populated urban streets and comprehensive parking maneuvers from entry to exit. DSC3D dataset was captured in five various locations in Europe and the United States and include: a parking lot, a crowded inner-city, a steep urban intersection, a federal highway, and a suburban intersection. Our 3D trajectory dataset aims to enhance autonomous driving systems by providing detailed environmental 3D representations, which could lead to improved obstacle interactions and safety. We demonstrate its utility across multiple applications including motion prediction, motion planning, scenario mining, and generative reactive traffic agents. Our interactive online visualization platform and the complete dataset are publicly available at this https URL, facilitating research in motion prediction, behavior modeling, and safety validation.

MCML Authors

Johannes Meier

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1969]

F. Förster, Q. Khan and D. Cremers.
Decentralized Reinforcement Learning for Multi-Agent Navigation in Unconstrained Environments.
IV 2025 - 36th IEEE Intelligent Vehicles Symposium. Napoca, Romania, Jun 22-25, 2025. To be published. Preprint available. PDF

Abstract

Supervised learning has demonstrated to be an effective strategy in training neural networks for vehicle navigation. However, it requires labeled data, which may not be available when a large number of vehicles need to be controlled simultaneously. In contrast, Deep Reinforcement Learning (DRL) circumvents the necessity for ground truth labels through environmental exploration. However, most concurrent DRL approaches either tend to operate in the discrete action/state space or do not consider the vehicle kinematics. In this paper, we use DRL to control multiple vehicles while also considering their kinematics. The task is for all the vehicles to reach their desired destination/target while avoiding collisions with each other or static obstacles in an unconstrained environment. For this, we propose a decentralized Proximal Policy Optimization (PPO) based DRL agent that independently provides control commands to each vehicle. The agent is based on two separate PPO models. The first is used to drive each vehicle to the proximity of its target. Once within the target’s proximity, the second model is used to park that vehicle at the correct position and orientation. The decentralized nature of the algorithm allows each agent to rely only on information about its current state and target, along with details regarding the closest obstacle/agent. By scaling this approach to all vehicles, simultaneous navigation of multiple vehicles can be achieved. Experimental results show a collective strategy that allows consistent results across a wide range of scenarios while scaling to situations with up to 20 vehicles and 12 stationary obstacles.

MCML Authors

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1968]

J. Meier, L. Inchingolo, O. Dhaouadi, Y. Xia, J. Kaiser and D. Cremers.
MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models.
IV 2025 - 36th IEEE Intelligent Vehicles Symposium. Napoca, Romania, Jun 22-25, 2025. To be published. Preprint available.

Abstract

We tackle the problem of monocular 3D object detection across different sensors, environments, and camera setups. In this paper, we introduce a novel unsupervised domain adaptation approach, MonoCT, that generates highly accurate pseudo labels for self-supervision. Inspired by our observation that accurate depth estimation is critical to mitigating domain shifts, MonoCT introduces a novel Generalized Depth Enhancement (GDE) module with an ensemble concept to improve depth estimation accuracy. Moreover, we introduce a novel Pseudo Label Scoring (PLS) module by exploring inner-model consistency measurement and a Diversity Maximization (DM) strategy to further generate high-quality pseudo labels for self-training. Extensive experiments on six benchmarks show that MonoCT outperforms existing SOTA domain adaptation methods by large margins (~21% minimum for AP Mod.) and generalizes well to car, traffic camera and drone views.

MCML Authors

Johannes Meier

Computer Vision & Artificial Intelligence

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1967]

F. Li, Y. Bi, D. Huang, Z. Jiang and N. Navab.
Robotic CBCT Meets Robotic Ultrasound.
IPCAI 2025 - International Conference on Information Processing in Computer-Assisted Interventions. Berlin, Germany, Jun 17-18, 2025. To be published. Preprint available. arXiv

Abstract

The multi-modality imaging system offers optimal fused images for safe and precise interventions in modern clinical practices, such as computed tomography - ultrasound (CT-US) guidance for needle insertion. However, the limited dexterity and mobility of current imaging devices hinder their integration into standardized workflows and the advancement toward fully autonomous intervention systems. In this paper, we present a novel clinical setup where robotic cone beam computed tomography (CBCT) and robotic US are pre-calibrated and dynamically co-registered, enabling new clinical applications. This setup allows registration-free rigid registration, facilitating multi-modal guided procedures in the absence of tissue deformation. First, a one-time pre-calibration is performed between the systems. To ensure a safe insertion path by highlighting critical vasculature on the 3D CBCT, SAM2 segments vessels from B-mode images, using the Doppler signal as an autonomously generated prompt. Based on the registration, the Doppler image or segmented vessel masks are then mapped onto the CBCT, creating an optimally fused image with comprehensive detail. To validate the system, we used a specially designed phantom, featuring lesions covered by ribs and multiple vessels with simulated moving flow. The mapping error between US and CBCT resulted in an average deviation of 1.72+-0.62 mm. A user study demonstrated the effectiveness of CBCT-US fusion for needle insertion guidance, showing significant improvements in time efficiency, accuracy, and success rate. Needle intervention performance improved by approximately 50% compared to the conventional US-guided workflow. We present the first robotic dual-modality imaging system designed to guide clinical applications. The results show significant performance improvements compared to traditional manual interventions.

MCML Authors

Feng Li

Computer Aided Medical Procedures & Augmented Reality

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1966]

S. A. Baumann, F. Krause, M. Neumayr, N. Stracke, M. Sevi, V. T. Hu and B. Ommer.
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. URL GitHub

Abstract

In recent years, advances in text-to-image (T2I) diffusion models have substantially elevated the quality of their generated images. However, achieving fine-grained control over attributes remains a challenge due to the limitations of natural language prompts (such as no continuous set of intermediate descriptions existing between person'' and old person’’). Even though many methods were introduced that augment the model or generation process to enable such control, methods that do not require a fixed reference image are limited to either enabling global fine-grained attribute expression control or coarse attribute expression control localized to specific subjects, not both simultaneously. We show that there exist directions in the commonly used token-level CLIP text embeddings that enable fine-grained subject-specific control of high-level attributes in text-to-image models. Based on this observation, we introduce one efficient optimization-free and one robust optimization-based method to identify these directions for specific attributes from contrastive text prompts. We demonstrate that these directions can be used to augment the prompt text input with fine-grained control over attributes of specific subjects in a compositional manner (control over multiple attributes of a single subject) without having to adapt the diffusion model.

MCML Authors

Vincent Tao Hu

Dr.

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1965]

Y. Yeganeh, A. Farshad, I. Charisiadis, M. Hasny, M. Hartenberger, B. Ommer, N. Navab and E. Adeli.
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. Highlight Paper. To be published. URL

Abstract

Scaling by training on large datasets has been shown to enhance the quality and fidelity of image generation and manipulation with diffusion models; however, such large datasets are not always accessible in medical imaging due to cost and privacy issues, which contradicts one of the main applications of such models to produce synthetic samples where real data is scarce. Also, finetuning on pre-trained general models has been a challenge due to the distribution shift between the medical domain and the pre-trained models. Here, we propose Latent Drift (LD) for diffusion models that can be adopted for any fine-tuning method to mitigate the issues faced by the distribution shift or employed in inference time as a condition. Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation, which is crucial to investigate how parameters such as gender, age, and adding or removing diseases in a patient would alter the medical images. We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation. Our results demonstrate significant performance gains in various scenarios when combined with different fine-tuning schemes. The source code of this work will be publicly released upon its acceptance.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Björn Ommer

Prof. Dr.

Computer Vision & Learning

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1964]

Q. Bouniot, I. Redko, A. Mallasto, C. Laclau, O. Struckmeier, K. Arndt, M. Heinonen, V. Kyrki and S. Kaski.
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

In the last decade, we have witnessed the introduction of several novel deep neural network (DNN) architectures exhibiting ever-increasing performance across diverse tasks. Explaining the upward trend of their performance, however, remains difficult as different DNN architectures of comparable depth and width – common factors associated with their expressive power – may exhibit a drastically different performance even when trained on the same dataset. In this paper, we introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for approximately measuring the non-linearity of deep neural networks. Built upon a score derived from closed-form optimal transport mappings, this signature provides a better understanding of the inner workings of a wide range of DNN architectures and learning paradigms, with a particular emphasis on the computer vision task. We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature and its potential for long-reaching implications.

MCML Authors

Quentin Bouniot

Dr.

Interpretable and Reliable Machine Learning

[1963]

H. Chen, H. Li, Y. Zhang, G. Zhang, J. Bi, P. Torr, J. Gu, D. Krompass and V. Tresp.
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention. OSFL requires only a single round of client data or model upload, which reduces communication costs and mitigates privacy threats compared to traditional FL. Despite these promising prospects, existing methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems. Recently, Latent Diffusion Models (LDM) have shown remarkable advancements in synthesizing high-quality images through pretraining on large-scale datasets, thereby presenting a potential solution to overcome these issues. However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data. This issue is particularly pronounced in rare domains, such as medical imaging, which are underrepresented in LDM’s pretraining data. To address this challenge, we propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level. Hereby, FedBiP synthesizes images following the client’s local data distribution without compromising the privacy regulations. FedBiP is also the first approach to simultaneously address feature space heterogeneity and client data scarcity in OSFL. Our method is validated through extensive experiments on three OSFL benchmarks with feature space heterogeneity, as well as on challenging medical and satellite image datasets with label heterogeneity. The results demonstrate the effectiveness of FedBiP, which substantially outperforms other OSFL methods.

MCML Authors

Haokun Chen

Database Systems and Data Mining

Hang Li

* Former Member

Yao Zhang

Database Systems and Data Mining

Gengyuan Zhang

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Database Systems and Data Mining

[1962]

Z. Chen, Y. Wang, L. Nan and X. Zhu.
Parametric Point Cloud Completion for Polygonal Surface Reconstruction.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL GitHub

Abstract

Existing polygonal surface reconstruction methods heavily depend on input completeness and struggle with incomplete point clouds. We argue that while current point cloud completion techniques may recover missing points, they are not optimized for polygonal surface reconstruction, where the parametric representation of underlying surfaces remains overlooked. To address this gap, we introduce parametric completion, a novel paradigm for point cloud completion, which recovers parametric primitives instead of individual points to convey high-level geometric structures. Our presented approach, PaCo, enables high-quality polygonal surface reconstruction by leveraging plane proxies that encapsulate both plane parameters and inlier points, proving particularly effective in challenging scenarios with highly incomplete data. Comprehensive evaluations of our approach on the ABC dataset establish its effectiveness with superior performance and set a new standard for polygonal surface reconstruction from incomplete data.

MCML Authors

Zhaiyu Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1961]

T. Dagès, S. Weber, Y.-W. E. Lin, R. Talmon, D. Cremers, M. Lindenbaum, A. M. B. Alfred M. Bruckstein and R. Kimmel.
Finsler Multi-Dimensional Scaling: Manifold Learning for Asymmetric Dimensionality Reduction and Embedding.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Dimensionality reduction is a fundamental task that aims to simplify complex data by reducing its feature dimensionality while preserving essential patterns, with core applications in data analysis and visualisation. To preserve the underlying data structure, multi-dimensional scaling (MDS) methods focus on preserving pairwise dissimilarities, such as distances. They optimise the embedding to have pairwise distances as close as possible to the data dissimilarities. However, the current standard is limited to embedding data in Riemannian manifolds. Motivated by the lack of asymmetry in the Riemannian metric of the embedding space, this paper extends the MDS problem to a natural asymmetric generalisation of Riemannian manifolds called Finsler manifolds. Inspired by Euclidean spaces, we define a canonical Finsler space for embedding asymmetric data. Due to its simplicity with respect to geodesics, data representation in this space is both intuitive and simple to analyse. We demonstrate that our generalisation benefits from the same theoretical convergence guarantees. We reveal the effectiveness of our Finsler embedding across various types of non-symmetric data, highlighting its value in applications such as data visualisation, dimensionality reduction, directed graph embedding, and link prediction.

MCML Authors

Simon Weber

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1960]

S. Dziadzio, V. Udandarao, K. Roth, A. Prabhu, Z. Akata, S. Albanie and M. Bethge.
How to Merge Your Multimodal Models Over Time?
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Model merging combines multiple expert models - finetuned from a base foundation model on diverse tasks and domains - into a single, more capable model. However, most existing model merging approaches assume that all experts are available simultaneously. In reality, new tasks and domains emerge progressively over time, requiring strategies to integrate the knowledge of expert models as they become available: a process we call temporal model merging. The temporal dimension introduces unique challenges not addressed in prior work, raising new questions such as: when training for a new task, should the expert model start from the merged past experts or from the original base model? Should we merge all models at each time step? Which merging techniques are best suited for temporal merging? Should different strategies be used to initialize the training and deploy the model? To answer these questions, we propose a unified framework called TIME - Temporal Integration of Model Expertise - which defines temporal model merging across three axes: (1) Initialization Phase, (2) Deployment Phase, and (3) Merging Technique. Using TIME, we study temporal model merging across model sizes, compute budgets, and learning horizons on the FoMo-in-Flux benchmark. Our comprehensive suite of experiments across TIME allows us to uncover key insights for temporal model merging, offering a better understanding of current challenges and best practices for effective temporal model merging.

MCML Authors

Karsten Roth

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1959]

T. Hannan, M. M. Islam, J. Gu, T. Seidl and G. Bertasius.
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL GitHub

Abstract

Large language models (LLMs) excel at retrieving information from lengthy text, but their vision-language counterparts (VLMs) face difficulties with hour-long videos, especially for temporal grounding. Specifically, these VLMs are constrained by frame limitations, often losing essential temporal details needed for accurate event localization in extended video content. We propose ReVisionLLM, a recursive vision-language model designed to locate events in hour-long videos. Inspired by human search strategies, our model initially targets broad segments of interest, progressively revising its focus to pinpoint exact temporal boundaries. Our model can seamlessly handle videos of vastly different lengths, from minutes to hours. We also introduce a hierarchical training strategy that starts with short clips to capture distinct events and progressively extends to longer videos. To our knowledge, ReVisionLLM is the first VLM capable of temporal grounding in hour-long videos, outperforming previous state-of-the-art methods across multiple datasets by a significant margin (+2.6% R1@0.1 on MAD).

MCML Authors

Tanveer Hannan

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[1958]

S. Kim, R. Xiao, M.-I. Georgescu, S. Alaniz and Z. Akata.
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Vision-Language Models (VLMs) trained with contrastive loss have achieved significant advancements in various vision and language tasks. However, the global nature of contrastive loss makes VLMs focus predominantly on foreground objects, neglecting other crucial information in the image, which limits their effectiveness in downstream tasks. To address these challenges, we propose COSMOS: CrOSs-MOdality Self-distillation for vision-language pre-training that integrates a novel text-cropping strategy and cross-attention module into a self-supervised learning framework. We create global and local views of images and texts (i.e., multi-modal augmentations), which are essential for self-distillation in VLMs. We further introduce a cross-attention module, enabling COSMOS to learn comprehensive cross-modal representations optimized via a cross-modality self-distillation loss. COSMOS consistently outperforms previous strong baselines on various zero-shot downstream tasks, including retrieval, classification, and semantic segmentation. Additionally, it surpasses CLIP-based models trained on larger datasets in visual perception and contextual understanding tasks.

MCML Authors

Sanghwan Kim

Interpretable and Reliable Machine Learning

Rui Xiao

Interpretable and Reliable Machine Learning

Iuliana Georgescu

Dr.

Interpretable and Reliable Machine Learning

Stephan Alaniz

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1957]

D. Mildenberger, P. Hager, D. Rückert and M. Menten.
A Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Supervised contrastive learning (SupCon) has proven to be a powerful alternative to the standard cross-entropy loss for classification of multi-class balanced datasets. However, it struggles to learn well-conditioned representations of datasets with long-tailed class distributions. This problem is potentially exacerbated for binary imbalanced distributions, which are commonly encountered during many real-world problems such as medical diagnosis. In experiments on seven binary datasets of natural and medical images, we show that the performance of SupCon decreases with increasing class imbalance. To substantiate these findings, we introduce two novel metrics that evaluate the quality of the learned representation space. By measuring the class distribution in local neighborhoods, we are able to uncover structural deficiencies of the representation space that classical metrics cannot detect. Informed by these insights, we propose two new supervised contrastive learning strategies tailored to binary imbalanced datasets that improve the structure of the representation space and increase downstream classification accuracy over standard SupCon by up to 35%. We make our code available.

MCML Authors

David Mildenberger

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

[1956]

E. Özsoy, C. Pellegrini, T. Czempiel, F. Tristram, K. Yuan, D. Bani-Harouni, U. Eck, B. Busam, M. Keicher and N. Navab.
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL GitHub

Abstract

Operating rooms (ORs) are complex, high-stakes environments requiring precise understanding of interactions among medical staff, tools, and equipment for enhancing surgical assistance, situational awareness, and patient safety. Current datasets fall short in scale, realism and do not capture the multimodal nature of OR scenes, limiting progress in OR modeling. To this end, we introduce MM-OR, a realistic and large-scale multimodal spatiotemporal OR dataset, and the first dataset to enable multimodal scene graph generation. MM-OR captures comprehensive OR scenes containing RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data and is annotated with panoptic segmentations, semantic scene graphs, and downstream task labels. Further, we propose MM2SG, the first multimodal large vision-language model for scene graph generation, and through extensive experiments, demonstrate its ability to effectively leverage multimodal inputs. Together, MM-OR and MM2SG establish a new benchmark for holistic OR understanding, and open the path towards multimodal scene analysis in complex, high-stakes environments.

MCML Authors

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Felix Tristram

Computer Aided Medical Procedures & Augmented Reality

Kun Yuan

Computer Aided Medical Procedures & Augmented Reality

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1955]

R. Qorbani, G. Villani, T. Panagiotakopoulos, M. B. Colomer, L. Härenstam-Nielsen, M. Segu, P. L. Dovesi, J. Karlgren, D. Cremers, F. Tombari and M. Poggi.
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Open-vocabulary semantic segmentation models associate vision and text to label pixels from an undefined set of classes using textual queries, providing versatile performance on novel datasets. However, large shifts between training and test domains degrade their performance, requiring fine-tuning for effective real-world application. We introduce Semantic Library Adaptation (SemLa), a novel framework for training-free, test-time domain adaptation. SemLa leverages a library of LoRA-based adapters indexed with CLIP embeddings, dynamically merging the most relevant adapters based on proximity to the target domain in the embedding space. This approach constructs an ad-hoc model tailored to each specific input without additional training. Our method scales efficiently, enhances explainability by tracking adapter contributions, and inherently protects data privacy, making it ideal for sensitive applications. Comprehensive experiments on an 18-domain benchmark built over 10 standard datasets demonstrate SemLa’s superior adaptability and performance across diverse settings, establishing a new standard in domain adaptation for open-vocabulary semantic segmentation.

MCML Authors

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

[1954]

P. Roetzer, V. Ehm, D. Cremers, Z. Lähner and F. Bernard.
Higher-Order Ratio Cycles for Fast and Globally Optimal Shape Matching.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

In this work we address various shape matching problems that can be cast as finding cyclic paths in a product graph. This involves for example 2D-3D shape matching, 3D shape matching, or the matching of a contour to a graph. In this context, matchings are typically obtained as the minimum cost cycle in the product graph. Instead, inspired by related works on model-based image segmentation, we consider minimum ratio cycles, which we combine with the recently introduced conjugate product graph in order to allow for higher-order matching costs. With that, on the one hand we avoid the bias of obtaining matchings that involve fewer/shorter edges, while on the other hand being able to impose powerful geometric regularisation, e.g. to avoid zig-zagging. In our experiments we demonstrate that this not only leads to improved matching accuracy in most cases, but also to significantly reduced runtimes (up to two orders of magnitude, depending on the setting). Our GPU implementation will be made publicly available upon acceptance.

MCML Authors

Viktoria Ehm

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1953]

K. Roth, Z. Akata, D. Damen, I. Balažević and O. J. Hénaff.
Context-Aware Multimodal Pretraining.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Large-scale multimodal representation learning successfully optimizes for zero-shot transfer at test time. Yet the standard pretraining paradigm (contrastive learning on large amounts of image-text data) does not explicitly encourage representations to support few-shot adaptation. In this work, we propose a simple, but carefully designed extension to multimodal pretraining which enables representations to accommodate additional context. Using this objective, we show that vision-language models can be trained to exhibit significantly increased few-shot adaptation: across 21 downstream tasks, we find up to four-fold improvements in test-time sample efficiency, and average few-shot adaptation gains of over 5%, while retaining zero-shot generalization performance across model scales and training durations. In particular, equipped with simple, training-free, metric-based adaptation mechanisms, our representations easily surpass more complex and expensive optimization-based schemes, vastly simplifying generalization to new domains.

MCML Authors

Karsten Roth

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1952]

L. Sang, Z. Canfes, D. Cao, R. Marin, F. Bernard and D. Cremers.
4Deform: Neural Surface Deformation for Robust Shape Interpolation.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Generating realistic intermediate shapes between non-rigidly deformed shapes is a challenging task in computer vision, especially with unstructured data (e.g., point clouds) where temporal consistency across frames is lacking, and topologies are changing. Most interpolation methods are designed for structured data (i.e., meshes) and do not apply to real-world point clouds. In contrast, our approach, 4Deform, leverages neural implicit representation (NIR) to enable free topology changing shape deformation. Unlike previous mesh-based methods that learn vertex-based deformation fields, our method learns a continuous velocity field in Euclidean space. Thus, it is suitable for less structured data such as point clouds. Additionally, our method does not require intermediate-shape supervision during training; instead, we incorporate physical and geometrical constraints to regularize the velocity field. We reconstruct intermediate surfaces using a modified level-set equation, directly linking our NIR with the velocity field. Experiments show that our method significantly outperforms previous NIR approaches across various scenarios (e.g., noisy, partial, topology-changing, non-isometric shapes) and, for the first time, enables new applications like 4D Kinect sequence upsampling and real-world high-resolution mesh deformation.

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1951]

D. Schnaus, N. Araslanov and D. Cremers.
It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL URL

Abstract

The platonic representation hypothesis suggests that vision and language embeddings become more homogeneous as model and dataset sizes increase. In particular, pairwise distances within each modality become more similar. This suggests that as foundation models mature, it may become possible to match vision and language embeddings in a fully unsupervised fashion, i.e., without parallel data. We present the first study towards this prospect, and investigate conformity of existing vision and language foundation models in the context of ‘blind’ matching. First, we formulate unsupervised matching as a quadratic assignment problem and introduce a novel heuristic that outperforms previous solvers. We also develop a technique to find optimal matching problems, for which a non-trivial match is very likely. Second, we conduct an extensive study deploying a range of vision and language models on four datasets. Our analysis reveals that for many problem instances, vision and language representations can be indeed matched without supervision. This finding opens possibility for exciting applications embedding semantic knowledge into other modalities. As a showcase, we demonstrate a proof-of-concept unsupervised classifier, which achieves non-trivial classification accuracy without any image-text annotation.

MCML Authors

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1950]

J. Schusterbauer, M. Gui, F. Fundel and B. Ommer.
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Recent advancements in diffusion models have established new benchmarks in both generative tasks and downstream applications. In contrast, flow matching models have shown promising improvements in performance but have not been as extensively explored, particularly due to the difficulty of inheriting knowledge from a pretrained diffusion prior foundation model.In this work, we propose a novel method to bridge the gap between pretrained diffusion models and flow matching models by aligning their trajectories and matching their objectives. Our approach mathematically formalizes this alignment and enables the efficient transfer of knowledge from diffusion priors to flow matching models. We demonstrate that our method outperforms traditional diffusion and flow matching finetuning, achieving competitive results across a variety of tasks.

MCML Authors

Johannes Schusterbauer

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1949]

N. Stracke, S. A. Baumann, K. Bauer, F. Fundel and B. Ommer.
CleanDIFT: Diffusion Features without Noise.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Internal features from large-scale pre-trained diffusion models have recently been established as powerful semantic descriptors for a wide range of downstream tasks. Works that use these features generally need to add noise to images before passing them through the model to obtain the semantic features, as the models do not offer the most useful features when given images with little to no noise. We show that this noise has a critical impact on the usefulness of these features that cannot be remedied by ensembling with different random noises. We address this issue by introducing a lightweight, unsupervised fine-tuning method that enables diffusion backbones to provide high-quality, noise-free semantic features. We show that these features readily outperform previous diffusion features by a wide margin in a wide variety of extraction setups and downstream tasks, offering better performance than even ensemble-based methods at a fraction of the cost.

MCML Authors

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1948]

F. Wimbauer, W. Chen, D. Muhle, C. Rupprecht and D. Cremers.
AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL URL

Abstract

Estimating camera motion and intrinsics from casual videos is a core challenge in computer vision. Traditional bundle-adjustment based methods, such as SfM and SLAM, struggle to perform reliably on arbitrary data. Although specialized SfM approaches have been developed for handling dynamic scenes, they either require intrinsics or computationally expensive test-time optimization and often fall short in performance. Recently, methods like Dust3r have reformulated the SfM problem in a more data-driven way. While such techniques show promising results, they are still 1) not robust towards dynamic objects and 2) require labeled data for supervised training.As an alternative, we propose AnyCam, a fast transformer model that directly estimates camera poses and intrinsics from a dynamic video sequence in feed-forward fashion. Our intuition is that such a network can learn strong priors over realistic camera motions. To scale up our training, we rely on an uncertainty-based loss formulation and pre-trained depth and flow networks instead of motion or trajectory supervision. This allows us to use diverse, unlabelled video datasets obtained mostly from YouTube. Additionally, we ensure that the predicted trajectory does not accumulate drift over time through a lightweight trajectory refinement step. We test AnyCam on established datasets, where it delivers accurate camera poses and intrinsics both qualitatively and quantitatively. Furthermore, even with trajectory refinement, AnyCam is significantly faster than existing works for SfM in dynamic settings. Finally, by combining camera information, uncertainty, and depth, our model can produce high-quality 4D pointclouds in a feed-forward fashion.

MCML Authors

Felix Wimbauer

Computer Vision & Artificial Intelligence

Weirong Chen

Computer Vision & Artificial Intelligence

Dominik Muhle

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1947]

R. Xiao, S. Kim, M.-I. Georgescu, Z. Akata and S. Alaniz.
FLAIR: VLM with Fine-grained Language-informed Image Representations.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL GitHub

Abstract

CLIP has shown impressive results in aligning images and texts at scale. However, its ability to capture detailed visual features remains limited because CLIP matches images and texts at a global level. To address this issue, we propose FLAIR, Fine-grained Language-informed Image Representations, an approach that utilizes long and detailed image descriptions to learn localized image embeddings. By sampling diverse sub-captions that describe fine-grained details about an image, we train our vision-language model to produce not only global embeddings but also text-specific image representations. Our model introduces text-conditioned attention pooling on top of local image tokens to produce fine-grained image representations that excel at retrieving detailed image content. We achieve state-of-the-art performance on both, existing multimodal retrieval benchmarks, as well as, our newly introduced fine-grained retrieval task which evaluates vision-language models’ ability to retrieve partial image content. Furthermore, our experiments demonstrate the effectiveness of FLAIR trained on 30M image-text pairs in capturing fine-grained visual information, including zero-shot semantic segmentation, outperforming models trained on billions of pairs.

MCML Authors

Rui Xiao

Interpretable and Reliable Machine Learning

Sanghwan Kim

Interpretable and Reliable Machine Learning

Iuliana Georgescu

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

Stephan Alaniz

Dr.

Interpretable and Reliable Machine Learning

[1946]

Y. Xie, V. Ehm, P. Roetzer, N. Amrani, M. Gao, F. Bernard and D. Cremers.
EchoMatch: Partial-to-Partial Shape Matching via Correspondence Reflection.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Finding correspondences between 3D shapes is a crucial problem in computer vision and graphics. While most research has focused on finding correspondences in settings where at least one of the shapes is complete, the realm of partial-to-partial shape matching remains under-explored. Yet it is of importance since, in many applications, shapes are only observed partially due to occlusion or scanning.Finding correspondences between partial shapes comes with an additional challenge: We not only want to identify correspondences between points on either shape but also have to determine which points of each shape actually have a partner.To tackle this challenging problem, we present EchoMatch, a novel framework for partial-to-partial shape matching that incorporates the concept of correspondence reflection to enable an overlap prediction within a functional map framework.With this approach, we show that we can outperform current SOTA methods in challenging partial-to-partial shape matching problems.

MCML Authors

Viktoria Ehm

Computer Vision & Artificial Intelligence

Maolin Gao

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1945]

Y. Yuan, Y. Xia, D. Cremers and M. Sester.
SparseAlign: a Fully Sparse Framework for Cooperative Object Detection.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Cooperative perception can increase the view field and decrease the occlusion of an ego vehicle, hence improving the perception performance and safety of autonomous driving. Despite the success of previous works on cooperative object detection, they mostly operate on dense Bird’s Eye View (BEV) feature maps, which is computationally demanding and can hardly be extended to long-range detection problems. More efficient fully sparse frameworks are rarely explored. In this work, we design a fully sparse framework, textit{SparseAlign}, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features. Extensive experimental results on both OPV2V and DairV2X datasets show that our framework, despite sparsity, outperforms the state of the art with less communication bandwidth requirements. In addition, experiments on the OPV2Vt and DairV2Xt datasets for time-aligned cooperative object detection also show a significant performance gain compared to the baseline works.

MCML Authors

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1944]

G. Zhang, M. L. A. Fok, J. Ma, Y. Xia, D. Cremers, P. Torr, V. Tresp and J. Gu.
Localizing Events in Videos with Multimodal Queries.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Video understanding is a pivotal task in the digital era, yet the dynamic and multievent nature of videos makes them labor-intensive and computationally demanding to process. Thus, localizing a specific event given a semantic query has gained importance in both user-oriented applications like video search and academic research into video foundation models. A significant limitation in current research is that semantic queries are typically in natural language that depicts the semantics of the target event. This setting overlooks the potential for multimodal semantic queries composed of images and texts. To address this gap, we introduce a new benchmark, ICQ, for localizing events in videos with multimodal queries, along with a new evaluation dataset ICQ-Highlight. Our new benchmark aims to evaluate how well models can localize an event given a multimodal semantic query that consists of a reference image, which depicts the event, and a refinement text to adjust the images’ semantics. To systematically benchmark model performance, we include 4 styles of reference images and 5 types of refinement texts, allowing us to explore model performance across different domains. We propose 3 adaptation methods that tailor existing models to our new setting and evaluate 10 SOTA models, ranging from specialized to large-scale foundation models. We believe this benchmark is an initial step toward investigating multimodal queries in video event localization.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1943]

D. Zhu, Y. Di, S. Gavranovic and S. Ilic.
SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Denoising diffusion probabilistic models have achieved significant success in point cloud generation, enabling numerous downstream applications, such as generative data augmentation and 3D model editing. However, little attention has been given to generating point clouds with point-wise segmentation labels, as well as to developing evaluation metrics for this task. Therefore, in this paper, we present SeaLion, a novel diffusion model designed to generate high-quality and diverse point clouds with fine-grained segmentation labels. Specifically, we introduce the semantic part-aware latent point diffusion technique, which leverages the intermediate features of the generative models to jointly predict the noise for perturbed latent points and associated part segmentation labels during the denoising process, and subsequently decodes the latent points to point clouds conditioned on part segmentation labels. To effectively evaluate the quality of generated point clouds, we introduce a novel point cloud pairwise distance calculation method named part-aware Chamfer distance (p-CD). This method enables existing metrics, such as 1-NNA, to measure both the local structural quality and inter-part coherence of generated point clouds. Experiments on the large-scale synthetic dataset ShapeNet and real-world medical dataset IntrA demonstrate that SeaLion achieves remarkable performance in generation quality and diversity, outperforming the existing state-of-the-art model, DiffFacto, by 13.33% and 6.52% on 1-NNA (p-CD) across the two datasets. Experimental analysis shows that SeaLion can be trained semi-supervised, thereby reducing the demand for labeling efforts. Lastly, we validate the applicability of SeaLion in generative data augmentation for training segmentation models and the capability of SeaLion to serve as a tool for part-aware 3D shape editing.

MCML Authors

Dekai Zhu

Computer Aided Medical Procedures & Augmented Reality

[1942]

C. Curreli, D. Muhle, A. Saroha, Z. Ye, R. Marin and D. Cremers.
Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Probabilistic human motion prediction aims to forecast multiple possible future movements from past observations. While current approaches report high diversity and realism, they often generate motions with undetected limb stretching and jitter. To address this, we introduce SkeletonDiffusion, a latent diffusion model that embeds an explicit inductive bias on the human body within its architecture and training. Our model is trained with a novel nonisotropic Gaussian diffusion formulation that aligns with the natural kinematic structure of the human skeleton. Results show that our approach outperforms conventional isotropic alternatives, consistently generating realistic predictions while avoiding artifacts such as limb distortion. Additionally, we identify a limitation in commonly used diversity metrics, which may inadvertently favor models that produce inconsistent limb lengths within the same sequence. SkeletonDiffusion sets a new benchmark on three real-world datasets, outperforming various baselines across multiple evaluation metrics.

MCML Authors

Cecilia Curreli

Computer Vision & Artificial Intelligence

Dominik Muhle

Computer Vision & Artificial Intelligence

Zhenzhang Ye

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1941]

O. Hahn, C. Reich, N. Araslanov, D. Cremers, C. Rupprecht and S. Roth.
Scene-Centric Unsupervised Panoptic Segmentation.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Unsupervised panoptic segmentation aims to partition an image into semantically meaningful regions and distinct object instances without training on manually annotated data. In contrast to prior work on unsupervised panoptic scene understanding, we eliminate the need for object-centric training data, enabling the unsupervised understanding of complex scenes. To that end, we present the first unsupervised panoptic method that directly trains on scene-centric imagery. In particular, we propose an approach to obtain high-resolution panoptic pseudo labels on complex scene-centric data combining visual representations, depth, and motion cues. Utilizing both pseudo-label training and a panoptic self-training strategy yields a novel approach that accurately predicts panoptic segmentation of complex scenes without requiring any human annotations. Our approach significantly improves panoptic quality, e.g., surpassing the recent state of the art in unsupervised panoptic segmentation on Cityscapes by 9.4% points in PQ.

MCML Authors

Christoph Reich

Computer Vision & Artificial Intelligence

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1940]

W. Li, H. Xu, J. Huang, H. Jung, P. Yu, N. Navab and B. Busam.
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. Preprint available. arXiv GitHub

Abstract

A key challenge in model-free category-level pose estimation is the extraction of contextual object features that generalize across varying instances within a specific category. Recent approaches leverage foundational features to capture semantic and geometry cues from data. However, these approaches fail under partial visibility. We overcome this with a first-complete-then-aggregate strategy for feature extraction utilizing class priors. In this paper, we present GCE-Pose, a method that enhances pose estimation for novel instances by integrating category-level global context prior. GCE-Pose performs semantic shape reconstruction with a proposed Semantic Shape Reconstruction (SSR) module. Given an unseen partial RGB-D object instance, our SSR module reconstructs the instance’s global geometry and semantics by deforming category-specific 3D semantic prototypes through a learned deep Linear Shape Model. We further introduce a Global Context Enhanced (GCE) feature fusion module that effectively fuses features from partial RGB-D observations and the reconstructed global context. Extensive experiments validate the impact of our global context prior and the effectiveness of the GCE fusion module, demonstrating that GCE-Pose significantly outperforms existing methods on challenging real-world datasets HouseCat6D and NOCS-REAL275.

MCML Authors

Weihang Li

Computer Aided Medical Procedures & Augmented Reality

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Hyunjun Jung

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1939]

D. Sinitsyn, L. Härenstam-Nielsen and D. Cremers.
PRaDA: Projective Radial Distortion Averaging.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. Preprint available. arXiv URL

Abstract

We tackle the problem of automatic calibration of radially distorted cameras in challenging conditions.Accurately determining distortion parameters typically requires either 1) solving the full Structure from Motion (SfM) problem involving camera poses, 3D points, and the distortion parameters, which is only possible if many images with sufficient overlap are provided, or 2) relying heavily on learning-based methods that are comparatively less accurate.In this work, we demonstrate that distortion calibration can be decoupled from 3D reconstruction, maintaining the accuracy of SfM-based methods while avoiding many of the associated complexities. This is achieved by working in Projective Space, where the geometry is unique up to a homography, which encapsulates all camera parameters except for distortion.Our proposed method, Projective Radial Distortion Averaging, averages multiple distortion estimates in a fully projective framework without creating 3d points and full bundle adjustment. By relying on pairwise projective relations, our methods support any feature-matching approaches without constructing point tracks across multiple images.

MCML Authors

Daniil Sinitsyn

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1938]

Y. Luo, R. Hoffmann, Y. Xia, O. Wysocki, B. Schwab, T. H. Kolbe and D. Cremers.
RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning.
PBVS @CVPR 2025 - 21st IEEE Workshop on Perception Beyond the Visible Spectrum at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025). Nashville, TN, USA, Jun 11-15, 2025. To be published. GitHub

Abstract

Semantic 3D city models are worldwide easy-accessible, providing accurate, object-oriented, and semantic-rich 3D priors. To date, their potential to mitigate the noise impact on radar object detection remains under-explored. In this paper, we first introduce a unique dataset, RadarCity, comprising 54K synchronized radar-image pairs and semantic 3D city models. Moreover, we propose a novel neural network, RADLER, leveraging the effectiveness of contrastive self-supervised learning (SSL) and semantic 3D city models to enhance radar object detection of pedestrians, cyclists, and cars. Specifically, we first obtain the robust radar features via a SSL network in the radar-image pretext task. We then use a simple yet effective feature fusion strategy to incorporate semantic-depth features from semantic 3D city models. Having prior 3D information as guidance, RADLER obtains more fine-grained details to enhance radar object detection. We extensively evaluate RADLER on the collected RadarCity dataset and demonstrate average improvements of 5.46% in mean avarage precision (mAP) and 3.51% in mean avarage recall (mAR) over previous radar object detection methods. We believe this work will foster further research on semantic-guided and map-supported radar object detection.

MCML Authors

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

B1 | Computer Vision
→ Group Almut Sophia Koepke

Computer Vision & Artificial Intelligence

[1937]

D. Zverev, T. Wiedemer, A. Prabhu, M. Bethge, W. Brendel and A. Koepke.
VGGSounder: Audio-Visual Evaluations for Foundation Models.
Sight and Sound @CVPR 2025 - Workshop Sight and Sound at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025). Nashville, TN, USA, Jun 11-15, 2025. PDF

Abstract

The emergence of audio-visual foundation models underscores the importance of reliably assessing their multi-modal understanding. The classification dataset VGGSound is commonly used as a benchmark for evaluating audio-visual understanding. However, our analysis identifies several critical issues in VGGSound, including incomplete labelling, partially overlapping classes, and misaligned modalities. These flaws lead to distorted evaluations of auditory and visual capabilities. To address these limitations, we introduce VGGSounder, a comprehensively re-annotated, multi-label test set extending VGGSound that is specifically designed to evaluate audio-visual foundation models. VGGSounder features detailed modality annotations, enabling precise analyses of modality-specific performance and revealing previously unnoticed model limitations. VGGSounder offers a robust benchmark supporting the future development of audio-visual foundation models.

MCML Authors

Daniil Zverev

Computer Vision & Artificial Intelligence

[1936]

W. Tang, W. Li, X. Liang, O. Wysocki, F. Biljecki, C. Holst and B. Jutzi.
Texture2LoD3: Enabling LoD3 Building Reconstruction With Panoramic Images.
USM3D @CVPR 2025 - 2nd Workshop on Urban Scene Modeling at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025). Nashville, TN, USA, Jun 11-15, 2025. To be published. URL GitHub

Abstract

Despite recent advancements in surface reconstruction, Level of Detail (LoD) 3 building reconstruction remains an unresolved challenge. The main issue pertains to the object-oriented modelling paradigm, which requires georeferencing, watertight geometry, facade semantics, and low-poly representation – Contrasting unstructured mesh-oriented models. In Texture2LoD3, we introduce a novel method leveraging the ubiquity of 3D building model priors and panoramic street-level images, enabling the reconstruction of LoD3 building models. We observe that prior low-detail building models can serve as valid planar targets for ortho-rectifying street-level panoramic images. Moreover, deploying segmentation on accurately textured low-level building surfaces supports maintaining essential georeferencing, watertight geometry, and low-poly representation for LoD3 reconstruction. In the absence of LoD3 validation data, we additionally introduce the ReLoD3 dataset, on which we experimentally demonstrate that our method leads to improved facade segmentation accuracy by 11% and can replace costly manual projections. We believe that Texture2LoD3 can scale the adoption of LoD3 models, opening applications in estimating building solar potential or enhancing autonomous driving simulations.

MCML Authors

Weihang Li

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[1935]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Preventing Sensitive Information Leakage via Post-hoc Orthogonalization with Application to Chest Radiograph Embeddings.
PAKDD 2025 - 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Sydney, Australia, Jun 10-13, 2025. DOI GitHub

Abstract

Deep learning has substantially advanced data analysis across various fields. However, research indicates that protected characteristics, such as age, sex, and race, are often implicitly encoded within the deep feature representations, or embeddings, generated by neural networks. This encoding can lead to inherent biases, which in turn may influence decision-making processes. In clinical settings, in particular, such biases risk leading to unfair treatment of certain subgroups, potentially resulting in serious consequences. After analyzing the sources of these biases in the field of radiology, we illustrate how embeddings of chest radiographs (CXRs) can be corrected to remove the influence of protected features. To showcase the harms of such incidents, we study the MIMIC and CheXpert datasets with three prominent pre-trained models: a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our experiments reveal a significant influence of protected features on predictions of pathologies in CXRs, demonstrating the potential harm of such practices. We then propose a correction method, removing these harmful effects while maintaining competitive predictive performance.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1934]

M. Aljoud, G. M. Tavares, C. Leiber and T. Seidl.
DCMatch - Identify Matching Architectures in Deep Clustering through Meta-Learning.
PAKDD 2025 - 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Sydney, Australia, Jun 10-13, 2025. To be published.

Abstract

The effectiveness of deepclustering algorithms like DeepEmbedded Clustering (DEC) is heavily influenced by the architecture of the neural network employed. However, selecting an optimal architecture is challenging due to the absence of labels in clustering tasks, which makes traditional Neural Architecture Search (NAS) methods unsuitable. To address this, we propose a novel dataset characterization method specifically tailored for image datasets, combining deep-learning-based and sta tistical feature extraction techniques. By utilizing features extracted from a small subset of images, our method effectively captures both high-level semantic and low-level statistical properties of the data. These dataset characteristics are then employed in a meta-learning framework to recommend autoencoder architectures likely to outperform default configurations. Extensive experiments on 20 image datasets validate the robustness of our approach, achieving improved clustering performance on 16 datasets compared to the baseline configuration.

MCML Authors

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining

Collin Leiber

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[1933]

M. Ahmadpanah, M. Gobbi, D. Hedin, J. Kinder and A. Sabelfeld.
CodeX: Contextual Flow Tracking for Browser Extensions.
CODASPY 2025 - 15th ACM Conference on Data and Application Security and Privacy. Pittsburgh, PA, USA, Jun 04-06, 2025. DOI

Abstract

Browser extensions put millions of users at risk when misusing their elevated privileges. Despite the current practices of semi-automated code vetting, privacy-violating extensions still thrive in the official stores. We propose an approach for tracking contextual flows from browser-specific sensitive sources like cookies, browsing history, bookmarks, and search terms to suspicious network sinks through network requests. We demonstrate the effectiveness of the approach by a prototype called CodeX that leverages the power of CodeQL while breaking away from the conservativeness of bug-finding flavors of the traditional CodeQL taint analysis. Applying CodeX to the extensions published on the Chrome Web Store between March 2021 and March 2024 identified 1,588 extensions with risky flows. Manual verification of 339 of those extensions resulted in flagging 212 as privacy-violating, impacting up to 3.6M users.

MCML Authors

Johannes Kinder

Prof. Dr.

Programming Languages and Artificial Intelligence

[1932]

J. Kaiser, J. Eigenmann, D. Rückert and G. Kaissis.
User-Level Differential Privacy in Medical Machine Learning.
TPDP 2025 - Workshop on Theory and Practice of Differential Privacy. Google, Mountain View, CA, USA, Jun 02-03, 2025. PDF

Abstract

We address the challenge of ensuring user-level DP when individuals contribute varying numbers of data records to a dataset. While group privacy can be used to aggregate record-level budgets, it can be overly pessimistic and lacks flexibility when users contribute varying numbers of data points. We propose a method for accounting for arbitrary numbers of records per user while maintaining a fixed per-user privacy guarantee by leveraging individual privacy assignment. Experimentally, our method yields excellent utility comparable to record-level DP while providing a more meaningful/interpretable protection.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Georgios Kaissis

Dr.

* Former Member

[1931]

V. Margraf, T. Koerner, A. Tornede and M. Wever.
RunAndSchedule2Survive: Algorithm Scheduling Based on Run2Survive.
ACM Transactions on Evolutionary Learning and Optimization Just accepted (Jun. 2025). DOI

Abstract

The algorithm selection problem aims to identify the most suitable algorithm for a given problem instance under specific time constraints, where suitability typically refers to a performance metric such as algorithm runtime. While previous work has employed machine learning techniques to tackle this challenge, methods from survival analysis have proven particularly effective. This paper presents RunAndSchedule2Survive to address the more general and complex problem of algorithm scheduling, where the objective is to allocate computational resources across multiple algorithms to maximize performance within specified time constraints. Our approach combines survival analysis with evolutionary algorithms to optimize algorithm schedules by leveraging runtime distributions modeled as survival functions. Experimental results across various standard benchmarks demonstrate that our approach significantly outperforms previous methods for algorithm scheduling and yields more robust results than its algorithm selection variant. More specifically, RunAndSchedule2Survive achieves superior performance in 20 out of 25 benchmark scenarios, surpassing hitherto state-of-the-art approaches.

MCML Authors

Valentin Margraf

Artificial Intelligence and Machine Learning

[1930]

L. Merker, M. Blessing, B. Zhang and H. S. Stein.
Information Dense and Industry Scalable Accelerated Formation.
Advanced Intelligent Discovery (Jun. 2025). DOI

Abstract

Bespoke formation of Batteries offers improved lifetime and performance but is generally associated with long processing times, high cost, and large floorspace. Facile strategies like heating or increasing the formation current, as well as current alterations during formation have their limits in speed up and efficiency. We present pulsed formation on graphitic anode full cells as an accelerated formation strategy and investigate its influence on various quality parameters. Optimized pulsed charging is demonstrated herein to reduce the formation time by more than 50% whilst maintaining or improving all other cell quality parameters including discharge capacity. The newly discovered protocol is scaled up to 25Ah prismatic cells in the PHEV1 format that confirm the accelerated and improved pulsed formation strategy. We attribute the accelerated and improved formation to an apt balance of surface and bulk diffusion which results in thinner, more homogenous SEI. Dynamics of pulsed formation also allow for the extraction of new quality markers while formation is happening.

MCML Authors

Helge Stein

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Digital Catalysis

[1929]

H. Boche, A. Fono and G. Kutyniok.
Mathematical Algorithm Design for Deep Learning under Societal and Judicial Constraints: The Algorithmic Transparency Requirement.
Applied and Computational Harmonic Analysis 77.101763 (Jun. 2025). DOI

Abstract

Deep learning still has drawbacks in terms of trustworthiness, which describes a comprehensible, fair, safe, and reliable method. To mitigate the potential risk of AI, clear obligations associated to trustworthiness have been proposed via regulatory guidelines, e.g., in the European AI Act. Therefore, a central question is to what extent trustworthy deep learning can be realized. Establishing the described properties constituting trustworthiness requires that the factors influencing an algorithmic computation can be retraced, i.e., the algorithmic implementation is transparent. Motivated by the observation that the current evolution of deep learning models necessitates a change in computing technology, we derive a mathematical framework which enables us to analyze whether a transparent implementation in a computing model is feasible. We exemplarily apply our trustworthiness framework to analyze deep learning approaches for inverse problems in digital and analog computing models represented by Turing and Blum-Shub-Smale Machines, respectively. Based on previous results, we find that Blum-Shub-Smale Machines have the potential to establish trustworthy solvers for inverse problems under fairly general conditions, whereas Turing machines cannot guarantee trustworthiness to the same degree.

MCML Authors

Adalbert Fono

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Helge Stein

Mathematical Foundations of Artificial Intelligence

[1928]

L. Merker, B. Zhang, J. Yuan, S. Ji and H. S. Stein.
Insight generation from information-dense formation protocols.
Batteries & Supercaps.e202500153 (Jun. 2025). DOI

Abstract

Accelerated formation protocols that utilize pulsed charging offer an unprecedented wealth of electrochemical data. Herein we present methods to extract diagnostic data relating to a pseudo-diffusion coefficients, internal resistance, and others that give live insight to the solid electrolyte interphase (SEI) growth. Specifically, we present a pure mathematical method to track formation progression at near-real time and chart a path towards incorporation of adjusting pulse parameters towards targeted SEI synthesis. The method and analysis performed on 3 mAh cells but can also be applied to higher capacity cells.

MCML Authors

Shanling Ji

Digital Catalysis

Helge Stein

Prof. Dr.

Digital Catalysis

[1927]

K. D. Bartl-Pokorny, A. Mallol-Ragolta, A. Spiesberger, A. Semertzidou, J. Löchner, F. B. Pokorny and B. W. Schuller.
'Hey Smartphone, Am I Ill?' Detecting Diseases From The Voice.
Frontiers Frontiers for Young Minds (Jun. 2025). URL

Abstract

As humans, we learn from what we perceive with our senses in our daily lives. Computers can have similar learning capabilities, allowing them to learn from what they ‘see’ and ‘hear’ and to use the knowledge they learn to solve future tasks. This ability is called artificial intelligence (AI). Devices equipped with AI, such as smartphones, smartwatches, or smart speakers, have now become our everyday companions. Among other things, they can listen to us and answer our questions. This type of technology is also playing a growing role in medicine. In this article, we explain how a computer can figure out whether the sound of a person’s voice or the way they speak indicates a certain disease. We demonstrate this using the example of detecting COVID-19, and discuss both problems and opportunities that arise when using AI for diagnosis.

MCML Authors

Adria Mallol-Ragolta

Health Informatics

Anika Spiesberger

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1926]

C. S. Vetter, A. Bender, D. B. Dwyer, M. Montembeault, A. Ruef, K. Chrisholm, L. Kambeitz-Ilankovic, L. A. Antonucci, S. Ruhrmann, J. Kambeitz, M. Lichtenstein, A. Riecher, R. Upthegrove, R. K. R. Salokangas, J. Hietala, C. Pantelis, R. Lencer, E. Meisenzahl, S. Wood, P. Brambilla, S. Borgwardt, P. Falkai, A. Bertolino, N. Koutsouleris and PRONIA Consortium.
Exploring the Predictive Value of Structural Covariance Networks for the Diagnosis of Schizophrenia.
Frontiers in Psychiatry 16 (Jun. 2025). DOI

Abstract

Schizophrenia is a psychiatric disorder hypothesized to result from disturbed brain connectivity. Structural covariance networks (SCN) describe the shared variation in morphological properties emerging from coordinated neurodevelopmental processes and may, thus, be a promising diagnostic biomarker for schizophrenia.We compared the diagnostic value of two SCN computation methods derived from regional gray matter volume (GMV) in 154 patients with a diagnosis of first episode psychosis or recurrent schizophrenia (PAT) and 366 healthy control individuals (HC). The first method (REF-SCN) quantifies the contribution of an individual to a normative reference group’s SCN, and the second approach (KLS-SCN) uses a symmetric version of Kulback-Leibler divergence. Their diagnostic value compared to regional GMV was assessed in a stepwise analysis using a series of linear support vector machines within a nested cross-validation framework and stacked generalization, all models were externally validated in an independent sample (NPAT=71, NHC=74), SCN feature importance was assessed, and the derived risk scores were analyzed for differential relationships with clinical variables.We found that models trained on SCNs were able to classify patients with schizophrenia and combining SCNs and regional GMV in a stacked model improved training (balanced accuracy (BAC)=69.96%) and external validation performance (BAC=67.10%). Among all unimodal models, the highest discovery sample performance was achieved by a model trained on REF-SCN (balanced accuracy (BAC=67.03%). All model decisions were driven by widespread structural covariance alterations involving the somato-motor, default mode, control, visual, and the ventral attention networks. Risk estimates derived from KLS-SCNs and regional GMV, but not REF-SCNs, could be predicted from clinical variables, especially driven by body mass index (BMI) and affect-related negative symptoms. These patterns of results show that different SCN computation approaches capture different aspects of the disease. While REF-SCNs contain valuable information for discriminating schizophrenia from healthy control individuals, KLS-SCNs may capture more nuanced symptom-level characteristics similar to those captured by PCA of regional GMV.

MCML Authors

Clara Sophie Vetter

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Artificial Intelligence in Healthcare and Medicine

[1925]

E. Pozzoli and A. Scagliotti.
Approximation of diffeomorphisms for quantum state transfers.
IEEE Control Systems Letters Early Access (Jun. 2025). DOI

Abstract

In this paper, we seek to combine two emerging standpoints in control theory. On the one hand, recent advances in infinite-dimensional geometric control have unlocked a method for controlling (with arbitrary precision and in arbitrarily small times) state transfers for bilinear Schrödinger PDEs posed on a Riemannian manifold M. In particular, these arguments rely on controllability results in the group of the diffeomorphisms of M. On the other hand, using tools of Γ-convergence, it has been proved that we can phrase the retrieve of a diffeomorphism of M as an ensemble optimal control problem. More precisely, this is done by employing a control-affine system for emph{simultaneously} steering a finite swarm of points towards the respective targets. Here we blend these two theoretical approaches and numerically find control laws driving state transitions (such as eigenstate transfers) in a bilinear Schrödinger PDE posed on the torus. Such systems have experimental relevance and are currently used to model rotational dynamics of molecules, and cold atoms trapped in periodic optical lattices.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[1924]

D. Zhao, M. Asgarimehr, K. Heidler, J. Wickert, X. Zhu and L. Mou.
Deep Learning-Based GNSS-R Global Vegetation Water Content: Dataset, Estimation, and Uncertainty.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Early Access (Jun. 2025). DOI

Abstract

Vegetation water content (VWC) is a crucial parameter for understanding vegetation dynamics and hydrological cycle on Earth. With rapid climate changes in recent years, monitoring VWC with high spatiotemporal coverage on a global scale is of paramount importance. Yet, traditional in situ measurements are constrained in remote and densely vegetated regions. Additionally, existing spaceborne remote sensing methods face challenges due to poor cloud penetration capabilities, soil moisture interference, and inadequate temporal resolution. Spaceborne global navigation satellite system reflectometry (GNSS-R) has demonstrated promising potential to overcome these limitations in vegetation monitoring. In this study, we propose a scheme for deep learning-based GNSS-R VWC assessment, leveraging a rapidly growing amount of GNSS-R data with an unprecedented sampling rate. We introduce a triplet dataset, which consists of measurements from the cyclone GNSS (CYGNSS), global land data assimilation system (GLDAS), and soil moisture active passive (SMAP), spanning over three years. Validation is performed using several benchmark models with the proposed dataset. Furthermore, the models’ predictive uncertainty is quantified with Monte Carlo (MC) dropout technique to provide a trustworthy representation of estimations. Experimental evaluation of the models demonstrates good consistency between the estimated VWC and ground truth, with a minimum root mean square deviation (RMSD) of 1.0988 kg/m2 and a bias of 0.002kg/m2 over a twelve-month test period. Moreover, a daily global VWC estimation is achieved through the proposed pipeline, filling the gaps of current products and enabling rapid measurements with enhanced temporal availability. We will make the proposed dataset publicly available.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1923]

Y. Ma, Q. Khan and D. Cremers.
MA-DV2F: A Multi-Agent Navigation Framework Using Dynamic Velocity Vector Field.
IEEE Robotics and Automation Letters 10.6 (Jun. 2025). DOI GitHub

Abstract

In this paper, we propose MA-DV2F: Multi-Agent Dynamic Velocity Vector Field. It is a framework for simultaneously controlling a group of vehicles in challenging environments. DV2F is generated for each vehicle independently and provides a map of reference orientation and speed that a vehicle must attain at any point on the navigation grid such that it safely reaches its target. The field is dynamically updated depending on the speed and proximity of the ego-vehicle to other agents. This dynamic adaptation of the velocity vector field allows prevention of imminent collisions. Experimental results show that MA-DV2F outperforms concurrent methods in terms of safety, computational efficiency and accuracy in reaching the target when scaling to a large number of vehicles.

MCML Authors

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1922]

S. Wang, Q. Cheng, Q. Cheng, W. Zhang, S.-C. Wu, N. Zeller, D. Cremers and N. Navab.
VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for Enhanced Indoor View Synthesis.
IEEE Robotics and Automation Letters 10.6 (Jun. 2025). DOI

Abstract

The generation of high-fidelity view synthesis is essential for robotic navigation and interaction but remains challenging, particularly in indoor environments and real-time scenarios. Existing techniques often require significant computational resources for both training and rendering, and they frequently result in suboptimal 3D representations due to insufficient geometric structuring. To address these limitations, we introduce VoxNeRF, a novel approach that utilizes easy-to-obtain geometry priors to enhance both the quality and efficiency of neural indoor reconstruction and novel view synthesis. We propose an efficient voxel-guided sampling technique that allocates computational resources selectively to the most relevant segments of rays based on a voxel-encoded geometry prior, significantly reducing training and rendering time. Additionally, we incorporate a robust depth loss to improve reconstruction and rendering quality in sparse view settings. Our approach is validated with extensive experiments on ScanNet and ScanNet++ where VoxNeRF outperforms existing state-of-the-art methods and establishes a new benchmark for indoor immersive interpolation and extrapolation settings.

MCML Authors

Sen Wang

Computer Aided Medical Procedures & Augmented Reality

Qing Cheng

Computer Vision & Artificial Intelligence

Qing Cheng

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Nassir Navab

Prof. Dr.

B3 | Multimodal Perception
→ Group Matthias Althoff

Computer Aided Medical Procedures & Augmented Reality

[1921]

J. Külz, M. Terzer, M. Magri, A. Giusti and M. Althoff.
Holistic Construction Automation with Modular Robots: From High-Level Task Specification to Execution.
IEEE Transactions on Automation Science and Engineering Early Access (Jun. 2025). DOI

Abstract

In situ robotic automation in construction is challenging due to constantly changing environments, a shortage of robotic experts, and a lack of standardized frameworks bridging robotics and construction practices. This work proposes a holistic framework for construction task specification, optimization of robot morphology, and mission execution using a mobile modular reconfigurable robot. Users can specify and monitor the desired robot behavior through a graphical interface. In contrast to existing, monolithic solutions, we automatically identify a new task-tailored robot for every task by integrating Building Information Modeling (BIM). Our framework leverages modular robot components that enable the fast adaption of robot hardware to the specific demands of the construction task. Other than previous works on modular robot optimization, we consider multiple competing objectives, which allow us to explicitly model the challenges of real-world transfer, such as calibration errors. We demonstrate our framework in simulation by optimizing robots for drilling and spray painting. Finally, experimental validation demonstrates that our approach robustly enables the autonomous execution of robotic drilling.

MCML Authors

Jonathan Külz

Cyber Physical Systems

Matthias Althoff

Prof. Dr.

Cyber Physical Systems

[1920]

P. Gupta, M. Wever and E. Hüllermeier.
Information Leakage Detection through Approximate Bayes-optimal Prediction.
Information Sciences In Press, Journal Pre-proof.122419 (Jun. 2025). DOI

Abstract

In today’s data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor’s log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

MCML Authors

Marcel Wever

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1919]

X. Zhao, Z. Xiong, P. Karlshöfer, N. Tziolas, M. Wiesmeier, U. Heiden and X. Zhu.
Soil organic carbon estimation using spaceborne hyperspectral composites on a large scale.
International Journal of Applied Earth Observation and Geoinformation 140 (Jun. 2025). DOI

Abstract

Soil Organic Carbon (SOC) is a key property for soil health. Spectral reflectance such as multispectral and hyperspectral data could provide efficient and cost-effective retrieval of SOC content. However, constrained by the availability of hyperspectral satellite data, current works mostly use a small number of spaceborne hyperspectral imagery for SOC retrieval on a small scale. In this work, the first large-scale hyperspectral imaging reflectance composites were built, and they were used for SOC estimation. Specifically, DESIS satellite images were used to predict SOC over the whole state of Bavaria in Germany ( 70,000 km). We prepare 850 hyperspectral images from the DESIS satellite and build temporal composites from them. For the soil data, data was gathered from LfU(Bavarian State Office for the Environment), LfL(Bavarian State Research Center for Agriculture) and LUCAS 2018 (Land Use and Coverage Area Frame Survey). 828 soil samples were selected after data filtering. For this regression task, different machine learning and deep learning methods were implemented and explored. Moreover, a spectral attention mechanism was added to the model. Besides hyperspectral input, the digital elevation model (DEM) was also included as an auxiliary input as the measured spectrum has inter-variability dependent on the elevation and the generated topographical features are also relevant with SOC distribution. Based on the regression results evaluated by , , and , the deep learning models showed much better performance than machine learning methods. Especially when only using hyperspectral data as input, the best result was achieved with 1.947%, 0.626, and 1.710 on the test set. After incorporating topographical features, the fused model achieved further improved performance with 1.752% and 0.695 and 1.919. From the interpretability analysis for model performance, it was found out that the bands in the range of 530 nm–570 nm, 770 nm–790 nm, and 840 nm - 870 nm are the most relevant bands for SOC estimation. In the end, several SOC maps were generated and analyzed together with soil types. The SOC maps indicate that water-associated areas, such as coastal soils and bogs, tend to have higher SOC, while mountain areas tend to contain lower SOC. Such findings align with SOC distribution across soil types and show the effectiveness of the model.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Data Science in Earth Observation

[1918]

C. Wu, B. Ma, Z. Zhang, N. Deng, Y. He and Y. Xue.
Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language Models.
International Journal of Machine Learning and Cybernetics (Jun. 2025). DOI

Abstract

Aspect-based sentiment analysis (ABSA), a sequence labeling task, has attracted increasing attention in multilingual contexts. While previous research has focused largely on fine-tuning or training models specifically for ABSA, we evaluate large language models (LLMs) under zero-shot conditions to explore their potential to tackle this challenge with minimal task-specific adaptation. We conduct a comprehensive empirical evaluation of a series of LLMs on multilingual ABSA tasks, investigating various prompting strategies, including vanilla zero-shot, chain-of-thought (CoT), self-improvement, self-debate, and self-consistency, across nine different models. Results indicate that while LLMs show promise in handling multilingual ABSA, they generally fall short of fine-tuned, task-specific models. Notably, simpler zero-shot prompts often outperform more complex strategies, especially in high-resource languages like English. These findings underscore the need for further refinement of LLM-based approaches to effectively address ABSA task across diverse languages.

MCML Authors

Bolei Ma

Social Data Science and AI

[1917]

Q. Xu, L. F. De Vos, Y. Shi, N. Rüther, A. Bronstert and X. Zhu.
Urban Flood Modeling and Forecasting with Deep Neural Operator and Transfer Learning.
Journal of Hydrology In Press, Journal Pre-proof.133705 (Jun. 2025). DOI

Abstract

Physics-based models provide accurate flood modeling but are limited by their dependence on high-quality data and computational demands, particularly in complex urban environments. Machine learning-based surrogate models like neural operators present a promising alternative; however, their practical application in urban flood modeling remains challenges, such as insufficient feature representation, high memory demands, and limited transferability. To address these challenges, this study introduces a deep neural operator (DNO) and a transfer learning-based DNO for fast, accurate, resolution-invariant, and cross-scenario urban flood forecasting. The DNO features an enhanced Fourier layer with skip connections for improved memory efficiency, alongside a deep encoder-decoder framework and an urban-embedded residual loss to enhance modeling effectiveness. The transfer learning-based DNO further integrates a fine-tuning-based approach for efficient cross-scenario forecasting in the target domain and a domain adaptation-based strategy for continuous learning across diverse domains. The fine-tuning-based DNO enables rapid adaptation to target domains, while the domain adaptation-based DNO mitigates knowledge forgetting from the source domain. Experimental results demonstrate that the proposed DNO significantly outperforms existing neural solvers using a comprehensive urban flood benchmark dataset, particularly in predicting high water depths and exhibiting exceptional zero-shot downscaling performance for high-resolution forecasting. Moreover, the fine-tuning-based DNO enhances transferability for cross-scenario urban flood forecasting, while the domain adaptation-based DNO achieves accurate flood predictions in both source and target domains, even with limited labeled target data. Through the combination of these ML methods and the benchmark dataset, a practical tool is established for effective, cross-scenario, and downscaled spatiotemporal urban flood forecasting.

MCML Authors

Qingsong Xu

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1916]

Y. Lemaréchal, G. Couture, F. Pelletier, R. Lefol, P.-L. Asselin, S. Ouellet, J. Bernard, L. Ebrahimpour, V. S. K. Manem, J. Topalis, B. Schachtner, S. Jodogne, P. Joubert, K. Jeblick, M. Ingrisch and P. Després.
PARADIM: A Platform to Support Research at the Interface of Data Science and Medical Imaging.
Journal of Imaging Informatics in Medicine (Jun. 2025). DOI

Abstract

This paper describes PARADIM, a digital infrastructure designed to support research at the interface of data science and medical imaging, with a focus on Research Data Management best practices. The platform is built from open-source components and rooted in the FAIR principles through strict compliance with the DICOM standard. It addresses key needs in data curation, governance, privacy, and scalable resource management. Supporting every stage of the data science discovery cycle, the platform offers robust functionalities for user identity and access management, data de-identification, storage, annotation, as well as model training and evaluation. Rich metadata are generated all along the research lifecycle to ensure the traceability and reproducibility of results. PARADIM hosts several medical image collections and allows the automation of large-scale, computationally intensive pipelines (e.g., automatic segmentation, dose calculations, AI model evaluation). The platform fills a gap at the interface of data science and medical imaging, where digital infrastructures are key in the development, evaluation, and deployment of innovative solutions in the real world.

MCML Authors

Johanna Topalis

Clinical Data Science in Radiology

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[1915]

S. Campell, P. Liu and S. Nyholm.
Can Chatbots Preserve Our Relationships with the Dead?
Journal of the American Philosophical Association 11.2 (Jun. 2025). DOI

Abstract

Imagine that you are given access to an AI chatbot that compellingly mimics the personality and speech of a deceased loved one. If you start having regular interactions with this ’thanabot’, could this new relationship be a continuation of the relationship you had with your loved one? And could a relationship with a thanabot preserve or replicate the value of a close human relationship? To the first question, we argue that a relationship with a thanabot cannot be a true continuation of your relationship with a deceased loved one, though it might support one’s continuing bonds with the dead. To the second question, we argue that, in and of themselves, relationships with thanabots cannot benefit us as much as rewarding and healthy intimate relationships with other humans, though we explain why it is difficult to make reliable comparative generalizations about the instrumental value of these relationships.

MCML Authors

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[1914]

M. Balcerak, J. Weidner, P. Karnakov, I. Ezhov, S. Litvinov, P. Koumoutsakos, T. Amiranashvili, R. Z. Zhang, J. S. Lowengrub, I. Yakushev, B. Wiestler and B. Menze.
Individualizing glioma radiotherapy planning by optimization of a data and physics-informed discrete loss.
Nature Communications 16.5982 (Jun. 2025). DOI

Abstract

Brain tumor growth is unique to each glioma patient and extends beyond what is visible in imaging scans, infiltrating surrounding brain tissue. Understanding these hidden patient-specific progressions is essential for effective therapies. Current treatment plans for brain tumors, such as radiotherapy, typically involve delineating a uniform margin around the visible tumor on pre-treatment scans to target this invisible tumor growth. This ‘one size fits all’ approach is derived from population studies and often fails to account for the nuances of individual patient conditions. We present the Glioma Optimizing the Discrete Loss (GliODIL) framework, which infers the full spatial distribution of tumor cell concentration from available multi-modal imaging, leveraging a Fisher-Kolmogorov type physics model to describe tumor growth. This is achieved through the newly introduced method of Optimizing the Discrete Loss (ODIL), where both data and physics-based constraints are softly assimilated into the solution. Our test dataset comprises 152 glioblastoma patients with pre-treatment imaging and post-treatment follow-ups for tumor recurrence monitoring. By blending data-driven techniques with physics-based constraints, GliODIL enhances recurrence prediction in radiotherapy planning, challenging traditional uniform margins and strict adherence to the Fisher-Kolmogorov partial differential equation model, which is adapted for complex cases.

MCML Authors

Jonas Weidner

AI for Image-Guided Diagnosis and Therapy

Benedikt Wiestler

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI for Image-Guided Diagnosis and Therapy

[1913]

P. Wicke and M. M. Bolognesi.
Red and blue language: Word choices in the Trump and Harris 2024 presidential debate.
PLOS One 20.6 (Jun. 2025). DOI GitHub

Abstract

Political debates are a peculiar type of political discourse, in which candidates directly confront one another, addressing not only the the moderator’s questions, but also their opponent’s statements, as well as the concerns of voters from both parties and undecided voters. Therefore, language is adjusted to meet specific expectations and achieve persuasion. We analyse how the language of Trump and Harris during the Presidential debate (September 10th, 2024) differs in relation to semantic and pragmatic features, for which we formulated targeted hypotheses: framing values and ideology, appealing to emotion, using words with different degrees of concreteness and specificity, addressing others through singular or plural pronouns. Our findings include: differences in the use of figurative frames (Harris often framing issues around recovery and empowerment, Trump often focused on crisis and decline); similar use of emotional language, with Trump showing a slightly higher tendency toward negativity and toward less subjective language compared to Harris; no significant difference in the specificity of candidates’ responses; similar use of abstract language, with Trump showing more variability than Harris, depending on the subject discussed; differences in addressing the opponent, with Trump not mentioning Harris by name, while Harris referring to Trump frequently; different uses of pronouns, with Harris using both singular and plural pronouns equally, while Trump using more singular pronouns. The results are discussed in relation to previous literature on Red and Blue language, which refers to distinct linguistic patterns associated with Republican (Red) and Democratic (Blue) political ideologies.

MCML Authors

Philipp Wicke

Dr.

Computational Linguistics

[1912]

R. R. Valiev, R. T. Nasibullin, H. Sandström, P. Rinke, K. Puolamäki and T. Kurten.
Predicting intersystem crossing rate constants of alkoxy-radical pairs with structure-based descriptors and machine learning.
Physical Chemistry Chemical Physics Advance Article (Jun. 2025). DOI

Abstract

Peroxy radicals (RO2) are ubiquitous intermediates in many oxidation processes, especially in the atmospheric gas phase. The recombination reaction of two peroxy radicals (RO2 + R′O2) has been demonstrated to lead, via several steps, to a triplet complex of two alkoxy radicals: 3(RO˙⋯R′O˙). The different product channels of RO2 + R′O2 reactions thus correspond to different reactions of this triplet complex. Of particular interest to atmospheric chemistry is the intersystem crossing (ISC) to the singlet state, which enables the recombination of the two radicals to an ROOR′ peroxide with considerably lower volatility than the original precursors. These peroxides are believed to be key contributors to the formation of secondary organic aerosol (SOA) particles, which in turn contribute to both air pollution and radiative forcing uncertainties. Developing reliable computational models for, e.g., RO2 + R′O2 branching ratios requires accurate estimates of the ISC rate constants, which can currently be obtained only from computationally expensive quantum chemistry calculations. By contrast, machine learning (ML) methods offer a faster alternative for estimating ISC rate constants. In the present work, we create a dataset with 98[thin space (1/6-em)]082 conformations of radical pairs and their corresponding rate constants. We apply three ML models—random forest (RF), CatBoost (CB), and a neural network (NN)—to predict ISC rate constants from triplet to singlet states. Specifically, the models predict kISC(T1 → Si) for i = 1–4 and the cumulative kISC(T1 → Sn), in alkoxy radical pairs, using only molecular geometry descriptors as inputs. All ML models achieved a mean absolute error (MAE) on our test set within one order of magnitude and a coefficient of determination R2 > 0.82 for all rate constants. Overall, the ML prediction matches the quantum chemical calculations within 1–2 orders of magnitude, providing a fast and scalable alternative for quantum chemical methods for ISC rate estimation.

MCML Authors

Patrick Rinke

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

AI-based Material Science

[1911]

S. Maskey, G. Kutyniok and R. Levie.
Generalization Bounds for Message Passing Networks on Mixture of Graphons.
SIAM Journal on Mathematics of Data Science 7.2 (Jun. 2025). DOI

Abstract

We study the generalization capabilities of Message Passing Neural Networks (MPNNs), a prevalent class of Graph Neural Networks (GNN). We derive generalization bounds specifically for MPNNs with normalized sum aggregation and mean aggregation. Our analysis is based on a data generation model incorporating a finite set of template graphons. Each graph within this framework is generated by sampling from one of the graphons with a certain degree of perturbation. In particular, we extend previous MPNN generalization results to a more realistic setting, which includes the following modifications: 1) we analyze simple random graphs with Bernoulli-distributed edges instead of weighted graphs; 2) we sample both graphs and graph signals from perturbed graphons instead of clean graphons; and 3) we analyze sparse graphs instead of dense graphs. In this more realistic and challenging scenario, we provide a generalization bound that decreases as the average number of nodes in the graphs increases. Our results imply that MPNNs with higher complexity than the size of the training set can still generalize effectively, as long as the graphs are sufficiently large.

MCML Authors

Sohir Maskey

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1910]

T. Boege, M. Drton, B. Hollering, S. Lumpp, P. Misra and D. Schkoda.
Conditional independence in stationary distributions of diffusions.
Stochastic Processes and their Applications 184.104604 (Jun. 2025). DOI

Abstract

Stationary distributions of multivariate diffusion processes have recently been proposed as probabilistic models of causal systems in statistics and machine learning. Motivated by these developments, we study stationary multivariate diffusion processes with a sparsely structured drift. Our main result gives a characterization of the conditional independence relations that hold in a stationary distribution. The result draws on a graphical representation of the drift structure and pertains to conditional independence relations that hold generally as a consequence of the drift’s sparsity pattern.

MCML Authors

Mathias Drton

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Patrick Rinke

Mathematical Statistics

[1909]

P. Pisal, O. Krejci and P. Rinke.
Machine learning accelerated descriptor design for catalyst discovery in CO2 to methanol conversion.
npj Computational Materials 11.213 (Jun. 2025). DOI

Abstract

Transforming CO2 into methanol represents a crucial step towards closing the carbon cycle, with thermoreduction technology nearing industrial application. However, obtaining high methanol yields and ensuring the stability of heterocatalysts remain significant challenges. Herein, we present a sophisticated computational framework to accelerate the discovery of thermal heterogeneous catalysts, using machine-learned force fields. We propose a new catalytic descriptor, termed adsorption energy distribution, that aggregates the binding energies for different catalyst facets, binding sites, and adsorbates. The descriptor is versatile and can be adjusted to a specific reaction through careful choice of the key-step reactants and reaction intermediates. By applying unsupervised machine learning and statistical analysis to a dataset comprising nearly 160 metallic alloys, we offer a powerful tool for catalyst discovery. We propose new promising candidates such as ZnRh and ZnPt3, which to our knowledge, have not yet been tested, and discuss their possible advantage in terms of stability.

MCML Authors

Prajwal Pisal

AI-based Material Science

Patrick Rinke

Prof. Dr.

AI-based Material Science

[1908]

A. Aghdam and V. T. Hu.
ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment.
Preprint (Jun. 2025). arXiv

Abstract

We address the task of zero-shot fine-grained video classification, where no video examples or temporal annotations are available for unseen action classes. While contrastive vision-language models such as SigLIP demonstrate strong open-set recognition via mean-pooled image-text similarity, they fail to capture the temporal structure critical for distinguishing fine-grained activities. We introduce ActAlign, a zero-shot framework that formulates video classification as sequence alignment. For each class, a large language model generates an ordered sub-action sequence, which is aligned with video frames using Dynamic Time Warping (DTW) in a shared embedding space. Without any video-text supervision or fine-tuning, ActAlign achieves 30.5% accuracy on the extremely challenging ActionAtlas benchmark, where human accuracy is only 61.6%. ActAlign outperforms billion-parameter video-language models while using approximately 8x less parameters. These results demonstrate that structured language priors, combined with classical alignment techniques, offer a scalable and general approach to unlocking the open-set recognition potential of vision-language models for fine-grained video understanding.

MCML Authors

Vincent Tao Hu

Dr.

Computer Vision & Learning

[1907]

S. Almi, M. Fornasier, J. Klemenc and A. Scagliotti.
Balanced quasistatic evolutions of critical points in metric spaces.
Preprint (Jun. 2025). arXiv

Abstract

Quasistatic evolutions of critical points of time-dependent energies exhibit piecewise smooth behavior, making them useful for modeling continuum mechanics phenomena like elastic-plasticity and fracture. Traditionally, such evolutions have been derived as vanishing viscosity and inertia limits, leading to balanced viscosity solutions. However, for nonconvex energies, these constructions have been realized in Euclidean spaces and assume non-degenerate critical points. In this paper, we take a different approach by decoupling the time scales of the energy evolution and of the transition to equilibria. Namely, starting from an equilibrium configuration, we let the energy evolve, while keeping frozen the system state; then, we update the state by freezing the energy, while letting the system transit via gradient flow or an approximation of it (e.g., minimizing movement or backward differentiation schemes). This approach has several advantages. It aligns with the physical principle that systems transit through energy-minimizing steady states. It is also fully constructive and computationally implementable, with physical and computational costs governed by appropriate action functionals. Additionally, our analysis is simpler and more general than previous formulations in the literature, as it does not require non-degenerate critical points. Finally, this approach extends to evolutions in locally compact metric path spaces, and our axiomatic presentation allows for various realizations.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Jona Klemenc

Applied Numerical Analysis

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[1906]

D. Bani-Harouni, C. Pellegrini, E. Özsoy, M. Keicher and N. Navab.
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning.
Preprint (Jun. 2025). arXiv

Abstract

Clinical decision-making is a dynamic, interactive, and cyclic process where doctors have to repeatedly decide on which clinical action to perform and consider newly uncovered information for diagnosis and treatment. Large Language Models (LLMs) have the potential to support clinicians in this process, however, most applications of LLMs in clinical decision support suffer from one of two limitations: Either they assume the unrealistic scenario of immediate availability of all patient information and do not model the interactive and iterative investigation process, or they restrict themselves to the limited ‘out-of-the-box’ capabilities of large pre-trained models without performing task-specific training. In contrast to this, we propose to model clinical decision-making for diagnosis with a hypothesis-driven uncertainty-aware language agent, LA-CDM, that converges towards a diagnosis via repeatedly requesting and interpreting relevant tests. Using a hybrid training paradigm combining supervised and reinforcement learning, we train LA-CDM with three objectives targeting critical aspects of clinical decision-making: accurate hypothesis generation, hypothesis uncertainty estimation, and efficient decision-making. We evaluate our methodology on MIMIC-CDM, a real-world dataset covering four abdominal diseases containing various clinical tests and show the benefit of explicitly training clinical decision-making for increasing diagnostic performance and efficiency.

MCML Authors

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1905]

L. Bastian, M. Rashed, N. Navab and T. Birdal.
Continuous-Time SO(3) Forecasting with Savitzky--Golay Neural Controlled Differential Equations.
Preprint (Jun. 2025). arXiv

Abstract

Tracking and forecasting the rotation of objects is fundamental in computer vision and robotics, yet SO(3) extrapolation remains challenging as (1) sensor observations can be noisy and sparse, (2) motion patterns can be governed by complex dynamics, and (3) application settings can demand long-term forecasting. This work proposes modeling continuous-time rotational object dynamics on SO(3) using Neural Controlled Differential Equations guided by Savitzky-Golay paths. Unlike existing methods that rely on simplified motion assumptions, our method learns a general latent dynamical system of the underlying object trajectory while respecting the geometric structure of rotations. Experimental results on real-world data demonstrate compelling forecasting capabilities compared to existing approaches.

MCML Authors

Lennart Bastian

B1 | Computer Vision
→ Group Nils Thuerey

Computer Aided Medical Procedures & Augmented Reality

Mohammad Rashed

Physics-based Simulation

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1904]

C. Benjamins, H. Graf, S. Segel, D. Deng, T. Ruhkopf, L. Hennig, S. Basu, N. Mallik, E. Bergman, D. Chen, F. Clément, M. Feurer, K. Eggensperger, F. Hutter, C. Doerr and M. Lindauer.
carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks.
Preprint (Jun. 2025). arXiv URL

Abstract

Hyperparameter Optimization (HPO) is crucial to develop well-performing machine learning models. In order to ease prototyping and benchmarking of HPO methods, we propose carps, a benchmark framework for Comprehensive Automated Research Performance Studies allowing to evaluate N optimizers on M benchmark tasks. In this first release of carps, we focus on the four most important types of HPO task types: blackbox, multi-fidelity, multi-objective and multi-fidelity-multi-objective. With 3 336 tasks from 5 community benchmark collections and 28 variants of 9 optimizer families, we offer the biggest go-to library to date to evaluate and compare HPO methods. The carps framework relies on a purpose-built, lightweight interface, gluing together optimizers and benchmark tasks. It also features an analysis pipeline, facilitating the evaluation of optimizers on benchmarks. However, navigating a huge number of tasks while developing and comparing methods can be computationally infeasible. To address this, we obtain a subset of representative tasks by minimizing the star discrepancy of the subset, in the space spanned by the full set. As a result, we propose an initial subset of 10 to 30 diverse tasks for each task type, and include functionality to re-compute subsets as more benchmarks become available, enabling efficient evaluations. We also establish a first set of baseline results on these tasks as a measure for future comparisons. With carps (this https URL), we make an important step in the standardization of HPO evaluation.

MCML Authors

Matthias Feurer

Prof. Dr.

A3 | Computational Models
→ Group Stefanie Jegelka

Statistical Learning and Data Science

[1903]

A. Bergmeister, M. K. Lal, S. Jegelka and S. Sra.
A projection-based framework for gradient-free and parallel learning.
Preprint (Jun. 2025). arXiv

Abstract

We present a feasibility-seeking approach to neural network training. This mathematical optimization framework is distinct from conventional gradient-based loss minimization and uses projection operators and iterative projection algorithms. We reformulate training as a large-scale feasibility problem: finding network parameters and states that satisfy local constraints derived from its elementary operations. Training then involves projecting onto these constraints, a local operation that can be parallelized across the network. We introduce PJAX, a JAX-based software framework that enables this paradigm. PJAX composes projection operators for elementary operations, automatically deriving the solution operators for the feasibility problems (akin to autodiff for derivatives). It inherently supports GPU/TPU acceleration, provides a familiar NumPy-like API, and is extensible. We train diverse architectures (MLPs, CNNs, RNNs) on standard benchmarks using PJAX, demonstrating its functionality and generality. Our results show that this approach is as a compelling alternative to gradient-based training, with clear advantages in parallelism and the ability to handle non-differentiable operations.

MCML Authors

Andreas Bergmeister

Foundations of Deep Neural Networks

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

Suvrit Sra

Prof. Dr.

Resource Aware Machine Learning

[1902]

N. Bhatia, P. Rinke and O. Krejci.
Leveraging active learning-enhanced machine-learned interatomic potential for efficient infrared spectra prediction.
Preprint (Jun. 2025). arXiv

Abstract

Infrared (IR) spectroscopy is a pivotal analytical tool as it provides real-time molecular insight into material structures and enables the observation of reaction intermediates in situ. However, interpreting IR spectra often requires high-fidelity simulations, such as density functional theory based ab-initio molecular dynamics, which are computationally expensive and therefore limited in the tractable system size and complexity. In this work, we present a novel active learning-based framework, implemented in the open-source software package PALIRS, for efficiently predicting the IR spectra of small catalytically relevant organic molecules. PALIRS leverages active learning to train a machine-learned interatomic potential, which is then used for machine learning-assisted molecular dynamics simulations to calculate IR spectra. PALIRS reproduces IR spectra computed with ab-initio molecular dynamics accurately at a fraction of the computational cost. PALIRS further agrees well with available experimental data not only for IR peak positions but also for their amplitudes. This advancement with PALIRS enables high-throughput prediction of IR spectra, facilitating the exploration of larger and more intricate catalytic systems and aiding the identification of novel reaction pathways.

MCML Authors

Patrick Rinke

Prof. Dr.

AI-based Material Science

[1901]

D. Biagini, N. Navab and A. Farshad.
HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation.
Preprint (Jun. 2025). arXiv

Abstract

Surgical Video Synthesis has emerged as a promising research direction following the success of diffusion models in general-domain video generation. Although existing approaches achieve high-quality video generation, most are unconditional and fail to maintain consistency with surgical actions and phases, lacking the surgical understanding and fine-grained guidance necessary for factual simulation. We address these challenges by proposing HieraSurg, a hierarchy-aware surgical video generation framework consisting of two specialized diffusion models. Given a surgical phase and an initial frame, HieraSurg first predicts future coarse-grained semantic changes through a segmentation prediction model. The final video is then generated by a second-stage model that augments these temporal segmentation maps with fine-grained visual features, leading to effective texture rendering and integration of semantic information in the video space. Our approach leverages surgical information at multiple levels of abstraction, including surgical phase, action triplets, and panoptic segmentation maps. The experimental results on Cholecystectomy Surgical Video Generation demonstrate that the model significantly outperforms prior work both quantitatively and qualitatively, showing strong generalization capabilities and the ability to generate higher frame-rate videos. The model exhibits particularly fine-grained adherence when provided with existing segmentation maps, suggesting its potential for practical surgical applications.

MCML Authors

Diego Biagini

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computer Aided Medical Procedures & Augmented Reality

[1900]

V. Blaschke, M. Winkler, C. Förster, G. Wenger-Glemser and B. Plank.
A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation.
Preprint (Jun. 2025). arXiv

Abstract

Although Germany has a diverse landscape of dialects, they are underrepresented in current automatic speech recognition (ASR) research. To enable studies of how robust models are towards dialectal variation, we present Betthupferl, an evaluation dataset containing four hours of read speech in three dialect groups spoken in Southeast Germany (Franconian, Bavarian, Alemannic), and half an hour of Standard German speech. We provide both dialectal and Standard German transcriptions, and analyze the linguistic differences between them. We benchmark several multilingual state-of-the-art ASR models on speech translation into Standard German, and find differences between how much the output resembles the dialectal vs. standardized transcriptions. Qualitative error analyses of the best ASR model reveal that it sometimes normalizes grammatical differences, but often stays closer to the dialectal constructions.

MCML Authors

Verena Blaschke

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1899]

F. Bongratz, T. N. Wolf, J. G. Ramon and C. Wachinger.
X-SiT: Inherently Interpretable Surface Vision Transformers for Dementia Diagnosis.
Preprint (Jun. 2025). arXiv

Abstract

Interpretable models are crucial for supporting clinical decision-making, driving advances in their development and application for medical images. However, the nature of 3D volumetric data makes it inherently challenging to visualize and interpret intricate and complex structures like the cerebral cortex. Cortical surface renderings, on the other hand, provide a more accessible and understandable 3D representation of brain anatomy, facilitating visualization and interactive exploration. Motivated by this advantage and the widespread use of surface data for studying neurological disorders, we present the eXplainable Surface Vision Transformer (X-SiT). This is the first inherently interpretable neural network that offers human-understandable predictions based on interpretable cortical features. As part of X-SiT, we introduce a prototypical surface patch decoder for classifying surface patch embeddings, incorporating case-based reasoning with spatially corresponding cortical prototypes. The results demonstrate state-of-the-art performance in detecting Alzheimer’s disease and frontotemporal dementia while additionally providing informative prototypes that align with known disease patterns and reveal classification errors.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Artificial Intelligence in Medical Imaging

[1898]

S. Casola, Y. J. Liu, S. Peng, O. Kraus, A. Gatt and B. Plank.
Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics.
Preprint (Jun. 2025). arXiv

Abstract

Human language production exhibits remarkable richness and variation, reflecting diverse communication styles and intents. However, this variation is often overlooked in summarization evaluation. While having multiple reference summaries is known to improve correlation with human judgments, the impact of using different reference sets on reference-based metrics has not been systematically investigated. This work examines the sensitivity of widely used reference-based metrics in relation to the choice of reference sets, analyzing three diverse multi-reference summarization datasets: SummEval, GUMSum, and DUC2004. We demonstrate that many popular metrics exhibit significant instability. This instability is particularly concerning for n-gram-based metrics like ROUGE, where model rankings vary depending on the reference sets, undermining the reliability of model comparisons. We also collect human judgments on LLM outputs for genre-diverse data and examine their correlation with metrics to supplement existing findings beyond newswire summaries, finding weak-to-no correlation. Taken together, we recommend incorporating reference set variation into summarization evaluation to enhance consistency alongside correlation with human judgments, especially when evaluating LLMs.

MCML Authors

Silvia Casola

Dr.

AI and Computational Linguistics

Yang Janet Liu

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Siyao Peng

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1897]

C. Casolo, S. Becker and N. Kilbertus.
Identifiability Challenges in Sparse Linear Ordinary Differential Equations.
Preprint (Jun. 2025). arXiv

Abstract

Dynamical systems modeling is a core pillar of scientific inquiry across natural and life sciences. Increasingly, dynamical system models are learned from data, rendering identifiability a paramount concept. For systems that are not identifiable from data, no guarantees can be given about their behavior under new conditions and inputs, or about possible control mechanisms to steer the system. It is known in the community that ’linear ordinary differential equations (ODE) are almost surely identifiable from a single trajectory.’ However, this only holds for dense matrices. The sparse regime remains underexplored, despite its practical relevance with sparsity arising naturally in many biological, social, and physical systems. In this work, we address this gap by characterizing the identifiability of sparse linear ODEs. Contrary to the dense case, we show that sparse systems are unidentifiable with a positive probability in practically relevant sparsity regimes and provide lower bounds for this probability. We further study empirically how this theoretical unidentifiability manifests in state-of-the-art methods to estimate linear ODEs from data. Our results corroborate that sparse systems are also practically unidentifiable. Theoretical limitations are not resolved through inductive biases or optimization dynamics. Our findings call for rethinking what can be expected from data-driven dynamical system modeling and allows for quantitative assessments of how much to trust a learned linear ODE.

MCML Authors

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Sören Becker

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Ethics in Systems Design and Machine Learning

[1896]

S. Chen, Y. Shi and X. Zhu.
Enhancing Monocular Height Estimation via Weak Supervision from Imperfect Labels.
Preprint (Jun. 2025). arXiv GitHub

Abstract

Monocular height estimation is considered the most efficient and cost-effective means of 3D perception in remote sensing, and it has attracted much attention since the emergence of deep learning. While training neural networks requires a large amount of data, data with perfect labels are scarce and only available within developed regions. The trained models therefore lack generalizability, which limits the potential for large-scale application of existing methods. We tackle this problem for the first time, by introducing data with imperfect labels into training pixel-wise height estimation networks, including labels that are incomplete, inexact, and inaccurate compared to high-quality labels. We propose an ensemble-based pipeline compatible with any monocular height estimation network. Taking the challenges of noisy labels, domain shift, and long-tailed distribution of height values into consideration, we carefully design the architecture and loss functions to leverage the information concealed in imperfect labels using weak supervision through balanced soft losses and ordinal constraints. We conduct extensive experiments on two datasets with different resolutions, DFC23 (0.5 to 1 m) and GBH (3 m). The results indicate that the proposed pipeline outperforms baselines by achieving more balanced performance across various domains, leading to improvements of average root mean square errors up to 22.94 %, and 18.62 % on DFC23 and GBH, respectively. The efficacy of each design component is validated through ablation studies.

MCML Authors

Sining Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1895]

T. Cheng, T. Vatter, T. Nagler and K. Chen.
Vine Copulas as Differentiable Computational Graphs.
Preprint (Jun. 2025). arXiv

Abstract

Vine copulas are sophisticated models for multivariate distributions and are increasingly used in machine learning. To facilitate their integration into modern ML pipelines, we introduce the vine computational graph, a DAG that abstracts the multilevel vine structure and associated computations. On this foundation, we devise new algorithms for conditional sampling, efficient sampling-order scheduling, and constructing vine structures for customized conditioning variables. We implement these ideas in torchvinecopulib, a GPU-accelerated Python library built upon PyTorch, delivering improved scalability for fitting, sampling, and density evaluation. Our experiments illustrate how gradient flowing through the vine can improve Vine Copula Autoencoders and that incorporating vines for uncertainty quantification in deep learning can outperform MC-dropout, deep ensembles, and Bayesian Neural Networks in sharpness, calibration, and runtime. By recasting vine copula models as computational graphs, our work connects classical dependence modeling with modern deep-learning toolchains and facilitates the integration of state-of-the-art copula methods in modern machine learning pipelines.

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[1894]

E. S. E. Eduardo Santos Escriche and S. Jegelka.
Learning equivariant models by discovering symmetries with learnable augmentations.
Preprint (Jun. 2025). arXiv

Abstract

Recently, a trend has emerged that favors learning relevant symmetries from data in geometric domains instead of designing constrained architectures. To do so, two popular options are (1) to modify the training protocol, e.g., with a specific loss and data augmentations (soft equivariance), or (2) to ignore equivariance and infer it only implicitly. However, both options have limitations: soft equivariance requires a priori knowledge about relevant symmetries, while inferring symmetries merely via the task and larger data lacks interpretability. To address both limitations, we propose SEMoLA, an end-to-end approach that jointly (1) discovers a priori unknown symmetries in the data via learnable data augmentations, and (2) softly encodes the respective approximate equivariance into an arbitrary unconstrained model. Hence, it does not need prior knowledge about symmetries, it offers interpretability, and it maintains robustness to distribution shifts. Empirically, we demonstrate the ability of SEMoLA to robustly discover relevant symmetries while achieving high prediction accuracy across various datasets, encompassing multiple data modalities and underlying symmetry groups.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1893]

K. Göbler, T. Windisch and M. Drton.
Nonlinear Causal Discovery for Grouped Data.
Preprint (Jun. 2025). arXiv

Abstract

Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social science, and industrial manufacturing, the causal units of interest are groups of variables rather than individual scalar measurements. Motivated by these applications, we extend nonlinear additive noise models to handle random vectors, establishing a two-step approach for causal graph learning: First, infer the causal order among random vectors. Second, perform model selection to identify the best graph consistent with this order. We introduce effective and novel solutions for both steps in the vector case, demonstrating strong performance in simulations. Finally, we apply our method to real-world assembly line data with partial knowledge of causal ordering among variable groups.

MCML Authors

Mathias Drton

Prof. Dr.

Mathematical Statistics

[1892]

E. Guha, R. Marten, S. Keh, N. Raoof, G. Smyrnis, H. Bansal, M. Nezhurina, J. Mercat, T. Vu, Z. Sprague, A. Suvarna, B. Feuer, L. Chen, Z. Khan, E. Frankel, S. Grover, C. Choi, N. Muennighoff, S. Su, W. Zhao, J. Yang, S. Pimpalgaonkar, K. Sharma, C. C.-J. Ji, Y. Deng, S. Pratt, V. Ramanujan, J. Saad-Falcon, J. Li, A. Dave, A. Albalak, K. Arora, B. Wulfe, C. Hegde, G. Durrett, S. Oh, M. Bansal, S. Gabriel, A. Grover, K.-W. Chang, V. Shankar, A. Gokaslan, M. A. Merrill, T. Hashimoto, Y. Choi, J. Jitsev, R. Heckel, M. Sathiamoorthy, A. G. Dimakis and L. Schmidt.
OpenThoughts: Data Recipes for Reasoning Models.
Preprint (Jun. 2025). arXiv URL

Abstract

Reasoning models have made rapid progress on many benchmarks involving math, code, and science. Yet, there are still many open questions about the best training recipes for reasoning since state-of-the-art models often rely on proprietary datasets with little to no public information available. To address this, the goal of the OpenThoughts project is to create open-source datasets for training reasoning models. After initial explorations, our OpenThoughts2-1M dataset led to OpenThinker2-32B, the first model trained on public reasoning data to match DeepSeek-R1-Distill-32B on standard reasoning benchmarks such as AIME and LiveCodeBench. We then improve our dataset further by systematically investigating each step of our data generation pipeline with 1,000+ controlled experiments, which led to OpenThoughts3. Scaling the pipeline to 1.2M examples and using QwQ-32B as teacher yields our OpenThoughts3-7B model, which achieves state-of-the-art results: 53% on AIME 2025, 51% on LiveCodeBench 06/24-01/25, and 54% on GPQA Diamond - improvements of 15.3, 17.2, and 20.5 percentage points compared to the DeepSeek-R1-Distill-Qwen-7B.

MCML Authors

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

[1891]

J. Huang, J. Liang, J. Hu, M. Sundermeyer, P. K. T. Yu, N. Navab and B. Busam.
XYZ-IBD: High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity.
Preprint (Jun. 2025). arXiv GitHub

Abstract

We introduce XYZ-IBD, a bin-picking dataset for 6D pose estimation that captures real-world industrial complexity, including challenging object geometries, reflective materials, severe occlusions, and dense clutter. The dataset reflects authentic robotic manipulation scenarios with millimeter-accurate annotations. Unlike existing datasets that primarily focus on household objects, which approach saturation,XYZ-IBD represents the unsolved realistic industrial conditions. The dataset features 15 texture-less, metallic, and mostly symmetrical objects of varying shapes and sizes. These objects are heavily occluded and randomly arranged in bins with high density, replicating the challenges of real-world bin-picking. XYZ-IBD was collected using two high-precision industrial cameras and one commercially available camera, providing RGB, grayscale, and depth images. It contains 75 multi-view real-world scenes, along with a large-scale synthetic dataset rendered under simulated bin-picking conditions. We employ a meticulous annotation pipeline that includes anti-reflection spray, multi-view depth fusion, and semi-automatic annotation, achieving millimeter-level pose labeling accuracy required for industrial manipulation. Quantification in simulated environments confirms the reliability of the ground-truth annotations. We benchmark state-of-the-art methods on 2D detection, 6D pose estimation, and depth estimation tasks on our dataset, revealing significant performance degradation in our setups compared to current academic household benchmarks. By capturing the complexity of real-world bin-picking scenarios, XYZ-IBD introduces more realistic and challenging problems for future research.

MCML Authors

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1890]

R. Huang, G. Zhai, Z. Bauer, M. Pollefeys, F. Tombari, L. Guibas, G. Huang and F. Engelmann.
Video Perception Models for 3D Scene Synthesis.
Preprint (Jun. 2025). arXiv

Abstract

Traditionally, 3D scene synthesis requires expert knowledge and significant manual effort. Automating this process could greatly benefit fields such as architectural design, robotics simulation, virtual reality, and gaming. Recent approaches to 3D scene synthesis often rely on the commonsense reasoning of large language models (LLMs) or strong visual priors of modern image generation models. However, current LLMs demonstrate limited 3D spatial reasoning ability, which restricts their ability to generate realistic and coherent 3D scenes. Meanwhile, image generation-based methods often suffer from constraints in viewpoint selection and multi-view inconsistencies. In this work, we present Video Perception models for 3D Scene synthesis (VIPScene), a novel framework that exploits the encoded commonsense knowledge of the 3D physical world in video generation models to ensure coherent scene layouts and consistent object placements across views. VIPScene accepts both text and image prompts and seamlessly integrates video generation, feedforward 3D reconstruction, and open-vocabulary perception models to semantically and geometrically analyze each object in a scene. This enables flexible scene synthesis with high realism and structural consistency. For more precise analysis, we further introduce First-Person View Score (FPVScore) for coherence and plausibility evaluation, utilizing continuous first-person perspective to capitalize on the reasoning ability of multimodal large language models. Extensive experiments show that VIPScene significantly outperforms existing methods and generalizes well across diverse scenarios. The code will be released.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

[1889]

E. Kavak, T. N. Wolf and C. Wachinger.
DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation.
Preprint (Jun. 2025). arXiv

Abstract

During prediction tasks, models can use any signal they receive to come up with the final answer - including signals that are causally irrelevant. When predicting objects from images, for example, the lighting conditions could be correlated to different targets through selection bias, and an oblivious model might use these signals as shortcuts to discern between various objects. A predictor that uses lighting conditions instead of real object-specific details is obviously undesirable. To address this challenge, we introduce a standard anti-causal prediction model (SAM) that creates a causal framework for analyzing the information pathways influencing our predictor in anti-causal settings. We demonstrate that a classifier satisfying a specific conditional independence criterion will focus solely on the direct causal path from label to image, being counterfactually invariant to the remaining variables. Finally, we propose DISCO, a novel regularization strategy that uses conditional distance correlation to optimize for conditional independence in regression tasks. We can show that DISCO achieves competitive results in different bias mitigation experiments, deeming it a valid alternative to classical kernel-based methods.

MCML Authors

Emre Kavak

Artificial Intelligence in Medical Imaging

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1888]

X. Li, D. Huang, Y. Zhang, N. Navab and Z. Jiang.
Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance.
Preprint (Jun. 2025). arXiv

Abstract

Understanding medical ultrasound imaging remains a long-standing challenge due to significant visual variability caused by differences in imaging and acquisition parameters. Recent advancements in large language models (LLMs) have been used to automatically generate terminology-rich summaries orientated to clinicians with sufficient physiological knowledge. Nevertheless, the increasing demand for improved ultrasound interpretability and basic scanning guidance among non-expert users, e.g., in point-of-care settings, has not yet been explored. In this study, we first introduce the scene graph (SG) for ultrasound images to explain image content to ordinary and provide guidance for ultrasound scanning. The ultrasound SG is first computed using a transformer-based one-stage method, eliminating the need for explicit object detection. To generate a graspable image explanation for ordinary, the user query is then used to further refine the abstract SG representation through LLMs. Additionally, the predicted SG is explored for its potential in guiding ultrasound scanning toward missing anatomies within the current imaging view, assisting ordinary users in achieving more standardized and complete anatomical exploration. The effectiveness of this SG-based image explanation and scanning guidance has been validated on images from the left and right neck regions, including the carotid and thyroid, across five volunteers. The results demonstrate the potential of the method to maximally democratize ultrasound by enhancing its interpretability and usability for ordinaries.

MCML Authors

Xuesong Li

Computer Aided Medical Procedures & Augmented Reality

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1887]

J. Liu, H. Li, C. Yang, M. Deutges, A. Sadafi, X. You, K. Breininger, N. Navab and P. J. Schüffler.
HASD: Hierarchical Adaption for pathology Slide-level Domain-shift.
Preprint (Jun. 2025). arXiv

Abstract

Domain shift is a critical problem for pathology AI as pathology data is heavily influenced by center-specific conditions. Current pathology domain adaptation methods focus on image patches rather than WSI, thus failing to capture global WSI features required in typical clinical scenarios. In this work, we address the challenges of slide-level domain shift by proposing a Hierarchical Adaptation framework for Slide-level Domain-shift (HASD). HASD achieves multi-scale feature consistency and computationally efficient slide-level domain adaptation through two key components: (1) a hierarchical adaptation framework that integrates a Domain-level Alignment Solver for feature alignment, a Slide-level Geometric Invariance Regularization to preserve the morphological structure, and a Patch-level Attention Consistency Regularization to maintain local critical diagnostic cues; and (2) a prototype selection mechanism that reduces computational overhead. We validate our method on two slide-level tasks across five datasets, achieving a 4.1% AUROC improvement in a Breast Cancer HER2 Grading cohort and a 3.9% C-index gain in a UCEC survival prediction cohort. Our method provides a practical and reliable slide-level domain adaption solution for pathology institutions, minimizing both computational and annotation costs.

MCML Authors

Jingsong Liu

Computational Pathology

Han Li

Dr.

Computational Pathology

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Peter Schüffler

Prof. Dr.

Computational Pathology

[1886]

X. Ma, C. Lin, Y. Zhang, V. Tresp and Y. Ma.
Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation.
Preprint (Jun. 2025). arXiv

Abstract

Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative ’team’ focused on a specific subtask. Agentic Neural Network follows a two-phase optimization strategy: (1) Forward Phase-Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase-Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables ANN to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across four benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements. Our findings indicate that ANN provides a scalable, data-driven framework for multi-agent systems, combining the collaborative capabilities of LLMs with the efficiency and flexibility of neural network principles. We plan to open-source the entire framework.

MCML Authors

Yao Zhang

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Yunpu Ma

Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Database Systems and Data Mining

[1885]

Y. Ma, D. Frauen, E. Javurek and S. Feuerriegel.
Foundation Models for Causal Inference via Prior-Data Fitted Networks.
Preprint (Jun. 2025). arXiv

Abstract

Prior-data fitted networks (PFNs) have recently been proposed as a promising way to train tabular foundation models. PFNs are transformers that are pre-trained on synthetic data generated from a prespecified prior distribution and that enable Bayesian inference through in-context learning. In this paper, we introduce CausalFM, a comprehensive framework for training PFN-based foundation models in various causal inference settings. First, we formalize the construction of Bayesian priors for causal inference based on structural causal models (SCMs) in a principled way and derive necessary criteria for the validity of such priors. Building on this, we propose a novel family of prior distributions using causality-inspired Bayesian neural networks that enable CausalFM to perform Bayesian causal inference in various settings, including back-door, front-door, and instrumental variable adjustment. Finally, we instantiate CausalFM and explicitly train a foundation model for estimating conditional average treatment effects (CATEs) using back-door adjustment. We show that CausalFM performs competitively for CATE estimation using various synthetic and semi-synthetic benchmarks. In sum, our framework can be used as a general recipe to train foundation models for various causal inference settings. In contrast to the current state-of-the-art in causal inference, CausalFM offers a novel paradigm with the potential to fundamentally change how practitioners perform causal inference in medicine, economics, and other disciplines.

MCML Authors

Yuchen Ma

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Emil Javurek

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Artificial Intelligence in Management

[1884]

P.-F. Massiani, C. Fiedler, L. Haverbeck, F. Solowjow and S. Trimpe.
A kernel conditional two-sample test.
Preprint (Jun. 2025). arXiv

Abstract

We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct conditional two-sample statistical tests. These tests identify the inputs – called covariates in this context – where two conditional expectations differ with high probability. Our key idea is to transform confidence bounds of a learning method into a conditional two-sample test, and we instantiate this principle for kernel ridge regression (KRR) and conditional kernel mean embeddings. We generalize existing pointwise-in-time or time-uniform confidence bounds for KRR to previously-inaccessible yet essential cases such as infinite-dimensional outputs with non-trace-class kernels. These bounds enable circumventing the need for independent data in our statistical tests, since they allow online sampling. We also introduce bootstrapping schemes leveraging the parametric form of testing thresholds identified in theory to avoid tuning inaccessible parameters, making our method readily applicable in practice. Such conditional two-sample tests are especially relevant in applications where data arrive sequentially or non-independently, or when output distributions vary with operational parameters. We demonstrate their utility through examples in process monitoring and comparison of dynamical systems. Overall, our results establish a comprehensive foundation for conditional two-sample testing, from theoretical guarantees to practical implementation, and advance the state-of-the-art on the concentration of vector-valued least squares estimation.

MCML Authors

Christian Fiedler

Dr.

Applied Numerical Analysis

[1883]

C. J. Mertens, H. Häntze, S. Ziegelmayer, J. N. Kather, D. Truhn, S. H. Kim, F. Busch, D. Weller, B. Wiestler, M. Graf, F. Bamberg, C. L. Schlett, J. B. Weiss, S. Ringhof, E. Can, J. Schulz-Menger, T. Niendorf, J. Lammert, I. Molwitz, A. Kader, A. Hering, A. Meddeb, J. Nawabi, M. B. Schulze, T. Keil, S. N. Willich, L. Krist, M. Hadamitzky, A. Hannemann, F. Bassermann, D. Rückert, T. Pischon, A. Hapfelmeier, M. R. Makowski, K. K. Bressem and L. C. Adams.
Deep learning-enabled MRI phenotyping uncovers regional body composition heterogeneity and disease associations in two European population cohorts.
Preprint (Jun. 2025). DOI

Abstract

Body mass index (BMI) does not account for substantial inter-individual differences in regional fat and muscle compartments, which are relevant for the prevalence of cardiometabolic and cancer conditions. We applied a validated deep learning pipeline for automated segmentation of whole-body MRI scans in 45,851 adults from the UK Biobank and German National Cohort, enabling harmonized quantification of visceral (VAT), gluteofemoral (GFAT), and abdominal subcutaneous adipose tissue (ASAT), liver fat fraction (LFF), and trunk muscle volume. Associations with clinical conditions were evaluated using compartment measures adjusted for age, sex, height, and BMI. Our analysis demonstrates that regional adiposity and muscle volume show distinct associations with cardiometabolic and cancer prevalence, and that substantial disease heterogeneity exists within BMI strata. The analytic framework and reference data presented here will support future risk stratification efforts and facilitate the integration of automated MRI phenotyping into large-scale population and clinical research.

MCML Authors

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1882]

J. Min, H. Li, T. Nagler and S. Li.
Assessing Climate-Driven Mortality Risk: A Stochastic Approach with Distributed Lag Non-Linear Models.
Preprint (Jun. 2025). arXiv

Abstract

Assessing climate-driven mortality risk has become an emerging area of research in recent decades. In this paper, we propose a novel approach to explicitly incorporate climate-driven effects into both single- and multi-population stochastic mortality models. The new model consists of two components: a stochastic mortality model, and a distributed lag non-linear model (DLNM). The first component captures the non-climate long-term trend and volatility in mortality rates. The second component captures non-linear and lagged effects of climate variables on mortality, as well as the impact of heat waves and cold waves across different age groups. For model calibration, we propose a backfitting algorithm that allows us to disentangle the climate-driven mortality risk from the non-climate-driven stochastic mortality risk. We illustrate the effectiveness and superior performance of our model using data from three European regions: Athens, Lisbon, and Rome. Furthermore, we utilize future UTCI data generated from climate models to provide mortality projections into 2045 across these regions under two Representative Concentration Pathway (RCP) scenarios. The projections show a noticeable decrease in winter mortality alongside a rise in summer mortality, driven by a general increase in UTCI over time. Although we expect slightly lower overall mortality in the short term under RCP8.5 compared to RCP2.6, a long-term increase in total mortality is anticipated under the RCP8.5 scenario.

MCML Authors

Han Li

Dr.

Computational Pathology

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[1881]

C. Pellegrini, E. Özsoy, D. Bani-Harouni, M. Keicher and N. Navab.
From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs.
Preprint (Jun. 2025). arXiv

Abstract

Healthcare systems face significant challenges in managing and interpreting vast, heterogeneous patient data for personalized care. Existing approaches often focus on narrow use cases with a limited feature space, overlooking the complex, longitudinal interactions needed for a holistic understanding of patient health. In this work, we propose a novel approach to patient pathway modeling by transforming diverse electronic health record (EHR) data into a structured representation and designing a holistic pathway prediction model, EHR2Path, optimized to predict future health trajectories. Further, we introduce a novel summary mechanism that embeds long-term temporal context into topic-specific summary tokens, improving performance over text-only models, while being much more token-efficient. EHR2Path demonstrates strong performance in both next time-step prediction and longitudinal simulation, outperforming competitive baselines. It enables detailed simulations of patient trajectories, inherently targeting diverse evaluation tasks, such as forecasting vital signs, lab test results, or length-of-stay, opening a path towards predictive and personalized healthcare.

MCML Authors

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Dietrich

Computer Aided Medical Procedures & Augmented Reality

[1880]

A. Rahma, C. Datar, A. Cukarska and F. Dietrich.
Rapid training of Hamiltonian graph networks without gradient descent.
Preprint (Jun. 2025). arXiv

Abstract

Learning dynamical systems that respect physical symmetries and constraints remains a fundamental challenge in data-driven modeling. Integrating physical laws with graph neural networks facilitates principled modeling of complex N-body dynamics and yields accurate and permutation-invariant models. However, training graph neural networks with iterative, gradient-based optimization algorithms (e.g., Adam, RMSProp, LBFGS) often leads to slow training, especially for large, complex systems. In comparison to 15 different optimizers, we demonstrate that Hamiltonian Graph Networks (HGN) can be trained up to 600x faster–but with comparable accuracy–by replacing iterative optimization with random feature-based parameter construction. We show robust performance in diverse simulations, including N-body mass-spring systems in up to 3 dimensions with different geometries, while retaining essential physical invariances with respect to permutation, rotation, and translation. We reveal that even when trained on minimal 8-node systems, the model can generalize in a zero-shot manner to systems as large as 4096 nodes without retraining. Our work challenges the dominance of iterative gradient-descent-based optimization algorithms for training neural network models for physical systems.

MCML Authors

Atamert Rahma

Physics-enhanced Machine Learning

Chinmay Datar

A2 | Mathematical Foundations
→ Group Felix Dietrich

Physics-enhanced Machine Learning

Ana Cukarska

A2 | Mathematical Foundations
→ Group Felix Dietrich

Physics-enhanced Machine Learning

Felix Dietrich

Prof. Dr.

Physics-enhanced Machine Learning

[1879]

S. Roschmann, Q. Bouniot, V. Feofanov, I. Redko and Z. Akata.
Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers.
Preprint (Jun. 2025). arXiv

Abstract

Time series classification is a fundamental task in healthcare and industry, yet the development of time series foundation models (TSFMs) remains limited by the scarcity of publicly available time series datasets. In this work, we propose Time Vision Transformer (TiViT), a framework that converts time series into images to leverage the representational power of frozen Vision Transformers (ViTs) pretrained on large-scale image datasets. First, we theoretically motivate our approach by analyzing the 2D patching of ViTs for time series, showing that it can increase the number of label-relevant tokens and reduce the sample complexity. Second, we empirically demonstrate that TiViT achieves state-of-the-art performance on standard time series classification benchmarks by utilizing the hidden representations of large OpenCLIP models. We explore the structure of TiViT representations and find that intermediate layers with high intrinsic dimension are the most effective for time series classification. Finally, we assess the alignment between TiViT and TSFM representation spaces and identify a strong complementarity, with further performance gains achieved by combining their features. Our findings reveal yet another direction for reusing vision representations in a non-visual domain.

MCML Authors

Quentin Bouniot

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Interpretable and Reliable Machine Learning

[1878]

M. Schöffel, E. Garces Arias, M. Wiedner, P. Ruppert, M. Li, C. Heumann and M. Aßenmacher.
Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages.
Preprint (Jun. 2025). arXiv

Abstract

Part-of-speech (POS) tagging remains a foundational component in natural language processing pipelines, particularly critical for historical text analysis at the intersection of computational linguistics and digital humanities. Despite significant advancements in modern large language models (LLMs) for ancient languages, their application to Medieval Romance languages presents distinctive challenges stemming from diachronic linguistic evolution, spelling variations, and labeled data scarcity. This study systematically investigates the central determinants of POS tagging performance across diverse corpora of Medieval Occitan, Medieval Spanish, and Medieval French texts, spanning biblical, hagiographical, medical, and dietary domains. Through rigorous experimentation, we evaluate how fine-tuning approaches, prompt engineering, model architectures, decoding strategies, and cross-lingual transfer learning techniques affect tagging accuracy. Our results reveal both notable limitations in LLMs’ ability to process historical language variations and non-standardized spelling, as well as promising specialized techniques that effectively address the unique challenges presented by low-resource historical languages.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1877]

A. Selivanov, P. Müller, Ö. Turgut, N. Stolt-Ansó and D. Rückert.
Global and Local Contrastive Learning for Joint Representations from Cardiac MRI and ECG.
Preprint (Jun. 2025). arXiv GitHub

Abstract

An electrocardiogram (ECG) is a widely used, cost-effective tool for detecting electrical abnormalities in the heart. However, it cannot directly measure functional parameters, such as ventricular volumes and ejection fraction, which are crucial for assessing cardiac function. Cardiac magnetic resonance (CMR) is the gold standard for these measurements, providing detailed structural and functional insights, but is expensive and less accessible. To bridge this gap, we propose PTACL (Patient and Temporal Alignment Contrastive Learning), a multimodal contrastive learning framework that enhances ECG representations by integrating spatio-temporal information from CMR. PTACL uses global patient-level contrastive loss and local temporal-level contrastive loss. The global loss aligns patient-level representations by pulling ECG and CMR embeddings from the same patient closer together, while pushing apart embeddings from different patients. Local loss enforces fine-grained temporal alignment within each patient by contrasting encoded ECG segments with corresponding encoded CMR frames. This approach enriches ECG representations with diagnostic information beyond electrical activity and transfers more insights between modalities than global alignment alone, all without introducing new learnable weights. We evaluate PTACL on paired ECG-CMR data from 27,951 subjects in the UK Biobank. Compared to baseline approaches, PTACL achieves better performance in two clinically relevant tasks: (1) retrieving patients with similar cardiac phenotypes and (2) predicting CMR-derived cardiac function parameters, such as ventricular volumes and ejection fraction. Our results highlight the potential of PTACL to enhance non-invasive cardiac diagnostics using ECG.

MCML Authors

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1876]

R. Skorobogat, K. Roth, M.-I. Georgescu and Z. Akata.
Subspace-Boosted Model Merging.
Preprint (Jun. 2025). arXiv

Abstract

Model merging enables the combination of multiple specialized expert models into a single model capable of performing multiple tasks. However, the benefits of merging an increasing amount of specialized experts generally lead to diminishing returns and reduced overall performance gains. In this work, we offer an explanation and analysis from a task arithmetic perspective; revealing that as the merging process (across numerous existing merging methods) continues for more and more experts, the associated task vector space experiences rank collapse. To mitigate this issue, we introduce Subspace Boosting, which operates on the singular value decomposed task vector space and maintains task vector ranks. Subspace Boosting raises merging efficacy for up to 20 expert models by large margins of more than 10% when evaluated on vision benchmarks. Moreover, we propose employing Higher-Order Generalized Singular Value Decomposition to further quantify task similarity, offering a new interpretable perspective on model merging.

MCML Authors

Karsten Roth

Interpretable and Reliable Machine Learning

Iuliana Georgescu

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1875]

S. Starck, V. Sideri-Lampretsa, B. Kainz, M. Menten, T. T. Mueller and D. Rückert.
Diff-Def: Diffusion-Generated Deformation Fields for Conditional Atlases.
Preprint (Jun. 2025). arXiv

Abstract

Anatomical atlases are widely used for population studies and analysis. Conditional atlases target a specific sub-population defined via certain conditions, such as demographics or pathologies, and allow for the investigation of fine-grained anatomical differences like morphological changes associated with ageing or disease. Existing approaches use either registration-based methods that are often unable to handle large anatomical variations or generative adversarial models, which are challenging to train since they can suffer from training instabilities. Instead of generating atlases directly in as intensities, we propose using latent diffusion models to generate deformation fields, which transform a general population atlas into one representing a specific sub-population. Our approach ensures structural integrity, enhances interpretability and avoids hallucinations that may arise during direct image synthesis by generating this deformation field and regularising it using a neighbourhood of images. We compare our method to several state-of-the-art atlas generation methods using brain MR images from the UK Biobank. Our method generates highly realistic atlases with smooth transformations and high anatomical fidelity, outperforming existing baselines. We demonstrate the quality of these atlases through comprehensive evaluations, including quantitative metrics for anatomical accuracy, perceptual similarity, and qualitative analyses displaying the consistency and realism of the generated atlases.

MCML Authors

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence in Healthcare and Medicine

[1874]

Z. S. Taghavi, A. Modarressi, Y. Ma and H. Schütze.
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge.
Preprint (Jun. 2025). arXiv GitHub

Abstract

Retrieval systems are central to many NLP pipelines, but often rely on surface-level cues such as keyword overlap and lexical semantic similarity. To evaluate retrieval beyond these shallow signals, recent benchmarks introduce reasoning-heavy queries; however, they primarily shift the burden to query-side processing techniques – like prompting or multi-hop retrieval – that can help resolve complexity. In contrast, we present ImpliRet, a benchmark that shifts the reasoning challenge to document-side processing: The queries are simple, but relevance depends on facts stated implicitly in documents through temporal (e.g., resolving ’two days ago’), arithmetic, and world knowledge relationships. We evaluate a range of sparse and dense retrievers, all of which struggle in this setting: the best nDCG@10 is only 15.07%. We also test whether long-context models can overcome this limitation. But even with a short context of only ten documents, including the positive document, GPT-4.1 scores only 35.06%, showing that document-side reasoning remains a challenge.

MCML Authors

Zeinab Sadat Taghavi

Computational Linguistics

Ali Modarressi

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Yunpu Ma

Dr.

Database Systems and Data Mining

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1873]

I. Tsangko, A. Triantafyllopoulos, A. Abdelmoula, A. Mallol-Ragolta and B. W. Schuller.
Reading Smiles: Proxy Bias in Foundation Models for Facial Emotion Recognition.
Preprint (Jun. 2025). arXiv

Abstract

Foundation Models (FMs) are rapidly transforming Affective Computing (AC), with Vision Language Models (VLMs) now capable of recognising emotions in zero shot settings. This paper probes a critical but underexplored question: what visual cues do these models rely on to infer affect, and are these cues psychologically grounded or superficially learnt? We benchmark varying scale VLMs on a teeth annotated subset of AffectNet dataset and find consistent performance shifts depending on the presence of visible teeth. Through structured introspection of, the best-performing model, i.e., GPT-4o, we show that facial attributes like eyebrow position drive much of its affective reasoning, revealing a high degree of internal consistency in its valence-arousal predictions. These patterns highlight the emergent nature of FMs behaviour, but also reveal risks: shortcut learning, bias, and fairness issues especially in sensitive domains like mental health and education.

MCML Authors

Iosif Tsangko

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Adria Mallol-Ragolta

Health Informatics

Björn Schuller

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Health Informatics

[1872]

L. von der Heyde, A.-C. Haensch, B. Weiß and J. Daikeler.
AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation.
Preprint (Jun. 2025). arXiv

Abstract

The recent development and wider accessibility of LLMs have spurred discussions about how they can be used in survey research, including classifying open-ended survey responses. Due to their linguistic capacities, it is possible that LLMs are an efficient alternative to time-consuming manual coding and the pre-training of supervised machine learning models. As most existing research on this topic has focused on English-language responses relating to non-complex topics or on single LLMs, it is unclear whether its findings generalize and how the quality of these classifications compares to established methods. In this study, we investigate to what extent different LLMs can be used to code open-ended survey responses in other contexts, using German data on reasons for survey participation as an example. We compare several state-of-the-art LLMs and several prompting approaches, and evaluate the LLMs’ performance by using human expert codings. Overall performance differs greatly between LLMs, and only a fine-tuned LLM achieves satisfactory levels of predictive performance. Performance differences between prompting approaches are conditional on the LLM used. Finally, LLMs’ unequal classification performance across different categories of reasons for survey participation results in different categorical distributions when not using fine-tuning. We discuss the implications of these findings, both for methodological research on coding open-ended responses and for their substantive analysis, and for practitioners processing or substantively analyzing such data. Finally, we highlight the many trade-offs researchers need to consider when choosing automated methods for open-ended response classification in the age of LLMs. In doing so, our study contributes to the growing body of research about the conditions under which LLMs can be efficiently, accurately, and reliably leveraged in survey research.

MCML Authors

Leah von der Heyde

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[1871]

T. Walter, H. Markgraf, J. Külz and M. Althoff.
Provably Safe Reinforcement Learning from Analytic Gradients.
Preprint (Jun. 2025). arXiv

Abstract

Deploying autonomous robots in safety-critical applications requires safety guarantees. Provably safe reinforcement learning is an active field of research which aims to provide such guarantees using safeguards. These safeguards should be integrated during training to prevent a large sim-to-real gap. While there are several approaches for safeguarding sampling-based reinforcement learning, analytic gradient-based reinforcement learning often achieves superior performance and sample efficiency. However, there is no safeguarding approach for this learning paradigm yet. Our work addresses this gap by developing the first effective safeguard for analytic gradient-based reinforcement learning. We analyse existing, differentiable safeguards, adapt them through modified mappings and gradient formulations, and integrate them with a state-of-the-art learning algorithm and a differentiable simulation. We evaluate how different safeguards affect policy optimisation using numerical experiments on two classical control tasks. The results demonstrate safeguarded training without compromising performance.

MCML Authors

Jonathan Külz

B3 | Multimodal Perception
→ Group Matthias Althoff

Cyber Physical Systems

Matthias Althoff

Prof. Dr.

Cyber Physical Systems

[1870]

A. Wang, D. Shu, Y. Wang, Y. Ma and M. Du.
Improving LLM Reasoning through Interpretable Role-Playing Steering.
Preprint (Jun. 2025). arXiv

Abstract

Role-playing has emerged as an effective technique for enhancing the reasoning capabilities of large language models (LLMs). However, existing methods primarily rely on prompt engineering, which often lacks stability and interpretability. In this paper, we introduce Sparse Autoencoder Role-Playing Steering (SRPS), a novel framework that identifies and manipulates internal model features associated with role-playing behavior. Our approach extracts latent representations from role-play prompts, selects the most relevant features based on activation patterns, and constructs a steering vector that can be injected into the model’s residual stream with controllable intensity. Our method enables fine-grained control over role-specific behavior and offers insights into how role information influences internal model activations. Extensive experiments across various reasoning benchmarks and model sizes demonstrate consistent performance gains. Notably, in the zero-shot chain-of-thought (CoT) setting, the accuracy of Llama3.1-8B on CSQA improves from 31.86% to 39.80%, while Gemma2-9B on SVAMP increases from 37.50% to 45.10%. These results highlight the potential of SRPS to enhance reasoning ability in LLMs, providing better interpretability and stability compared to traditional prompt-based role-playing.

MCML Authors

Yunpu Ma

Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Database Systems and Data Mining

[1869]

K. Wang, T. Klug, S. Ruschke, J. Kirschke and R. Heckel.
Reliable Evaluation of MRI Motion Correction: Dataset and Insights.
Preprint (Jun. 2025). arXiv

Abstract

Correcting motion artifacts in MRI is important, as they can hinder accurate diagnosis. However, evaluating deep learning-based and classical motion correction methods remains fundamentally difficult due to the lack of accessible ground-truth target data. To address this challenge, we study three evaluation approaches: real-world evaluation based on reference scans, simulated motion, and reference-free evaluation, each with its merits and shortcomings. To enable evaluation with real-world motion artifacts, we release PMoC3D, a dataset consisting of unprocessed Paired Motion-Corrupted 3D brain MRI data. To advance evaluation quality, we introduce MoMRISim, a feature-space metric trained for evaluating motion reconstructions. We assess each evaluation approach and find real-world evaluation together with MoMRISim, while not perfect, to be most reliable. Evaluation based on simulated motion systematically exaggerates algorithm performance, and reference-free evaluation overrates oversmoothed deep learning outputs.

MCML Authors

Tobit Klug

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

[1868]

M. Wang, S. Chen, K. Kersting, V. Tresp and Y. Ma.
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding.
Preprint (Jun. 2025). arXiv

Abstract

Recent advances in Video Large Language Models (VLLMs) have significantly enhanced their ability to understand video content. Nonetheless, processing long videos remains challenging due to high computational demands and the redundancy present in the visual data. In this work, we propose METok, a training-free, Multi-stage Event-based Token compression framework designed to accelerate VLLMs’ inference while preserving accuracy. METok progressively eliminates redundant visual tokens across three critical stages: (1) event-aware compression during vision encoding, (2) hierarchical token pruning in the prefilling stage based on semantic alignment and event importance, and (3) a decoding-stage KV Cache optimization that further reduces memory consumption. Our experiments on diverse video benchmarks demonstrate that METok achieves an optimal trade-off between efficiency and accuracy by dynamically selecting informative visual tokens. For instance, equipping LongVA-7B with METok realizes an 80.6% FLOPs reduction and 93.5% KV Cache memory savings, all while maintaining comparable or even superior accuracy.

MCML Authors

Shuo Chen

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

[1867]

Y. Wang, J. Bi, Y. Ma and S. Pirk.
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM.
Preprint (Jun. 2025). arXiv

Abstract

Multimodal Large Language Model (MLLM) often suffer from hallucinations. They over-rely on partial cues and generate incorrect responses. Recently, methods like Visual Contrastive Decoding (VCD) and Instruction Contrastive Decoding (ICD) have been proposed to mitigate hallucinations by contrasting predictions from perturbed or negatively prefixed inputs against original outputs. In this work, we uncover that methods like VCD and ICD fundamentally influence internal attention dynamics of the model. This observation suggests that their effectiveness may not stem merely from surface-level modifications to logits but from deeper shifts in attention distribution. Inspired by this insight, we propose an attention-steerable contrastive decoding framework that directly intervenes in attention mechanisms of the model to offer a more principled approach to mitigating hallucinations. Our experiments across multiple MLLM architectures and diverse decoding methods demonstrate that our approach significantly reduces hallucinations and improves the performance on benchmarks such as POPE, CHAIR, and MMHal-Bench, while simultaneously enhancing performance on standard VQA benchmarks.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining

[1866]

Z. Xu, H. Li, D. Sun, Z. Li, Y. Li, Q. Kong, Z. Cheng, N. Navab and S. K. Zhou.
NeRF-based CBCT Reconstruction needs Normalization and Initialization.
Preprint (Jun. 2025). arXiv

Abstract

Cone Beam Computed Tomography (CBCT) is widely used in medical imaging. However, the limited number and intensity of X-ray projections make reconstruction an ill-posed problem with severe artifacts. NeRF-based methods have achieved great success in this task. However, they suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network. Specifically, in each training step, only a subset of the hash encoder’s parameters is used (local sparse), whereas all parameters in the neural network participate (global dense). Consequently, hash features generated in each step are highly misaligned, as they come from different subsets of the hash encoder. These misalignments from different training steps are then fed into the neural network, causing repeated inconsistent global updates in training, which leads to unstable training, slower convergence, and degraded reconstruction quality. Aiming to alleviate the impact of this local-global optimization mismatch, we introduce a Normalized Hash Encoder, which enhances feature consistency and mitigates the mismatch. Additionally, we propose a Mapping Consistency Initialization(MCI) strategy that initializes the neural network before training by leveraging the global mapping property from a well-trained model. The initialized neural network exhibits improved stability during early training, enabling faster convergence and enhanced reconstruction performance. Our method is simple yet effective, requiring only a few lines of code while substantially improving training efficiency on 128 CT cases collected from 4 different datasets, covering 7 distinct anatomical regions.

MCML Authors

Han Li

Dr.

Computational Pathology

Nassir Navab

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Aided Medical Procedures & Augmented Reality

[1865]

S. Yuan, E. Nie, L. Kouba, A. Y. Kangen, H. Schmid, H. Schütze and M. Färber.
LLM in the Loop: Creating the ParaDeHate Dataset for Hate Speech Detoxification.
Preprint (Jun. 2025). arXiv

Abstract

Detoxification, the task of rewriting harmful language into non-toxic text, has become increasingly important amid the growing prevalence of toxic content online. However, high-quality parallel datasets for detoxification, especially for hate speech, remain scarce due to the cost and sensitivity of human annotation. In this paper, we propose a novel LLM-in-the-loop pipeline leveraging GPT-4o-mini for automated detoxification. We first replicate the ParaDetox pipeline by replacing human annotators with an LLM and show that the LLM performs comparably to human annotation. Building on this, we construct ParaDeHate, a large-scale parallel dataset specifically for hatespeech detoxification. We release ParaDeHate as a benchmark of over 8K hate/non-hate text pairs and evaluate a wide range of baseline methods. Experimental results show that models such as BART, fine-tuned on ParaDeHate, achieve better performance in style accuracy, content preservation, and fluency, demonstrating the effectiveness of LLM-generated detoxification text as a scalable alternative to human annotation.

MCML Authors

Ercong Nie

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1864]

S. Yuan, E. Nie, M. Tawfelis, H. Schmid, H. Schütze and M. Färber.
Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models.
Preprint (Jun. 2025). arXiv

Abstract

Hate speech detection is a socially sensitive and inherently subjective task, with judgments often varying based on personal traits. While prior work has examined how socio-demographic factors influence annotation, the impact of personality traits on Large Language Models (LLMs) remains largely unexplored. In this paper, we present the first comprehensive study on the role of persona prompts in hate speech classification, focusing on MBTI-based traits. A human annotation survey confirms that MBTI dimensions significantly affect labeling behavior. Extending this to LLMs, we prompt four open-source models with MBTI personas and evaluate their outputs across three hate speech datasets. Our analysis uncovers substantial persona-driven variation, including inconsistencies with ground truth, inter-persona disagreement, and logit-level biases. These findings highlight the need to carefully define persona prompts in LLM-based annotation workflows, with implications for fairness and alignment with human values.

MCML Authors

Ercong Nie

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1863]

K. Zaripova, E. Özsoy, N. Navab and A. Farshad.
PhenoKG: Knowledge Graph-Driven Gene Discovery and Patient Insights from Phenotypes Alone.
Preprint (Jun. 2025). arXiv

Abstract

Identifying causative genes from patient phenotypes remains a significant challenge in precision medicine, with important implications for the diagnosis and treatment of genetic disorders. We propose a novel graph-based approach for predicting causative genes from patient phenotypes, with or without an available list of candidate genes, by integrating a rare disease knowledge graph (KG). Our model, combining graph neural networks and transformers, achieves substantial improvements over the current state-of-the-art. On the real-world MyGene2 dataset, it attains a mean reciprocal rank (MRR) of 24.64% and nDCG@100 of 33.64%, surpassing the best baseline (SHEPHERD) at 19.02% MRR and 30.54% nDCG@100. We perform extensive ablation studies to validate the contribution of each model component. Notably, the approach generalizes to cases where only phenotypic data are available, addressing key challenges in clinical decision support when genomic information is incomplete.

MCML Authors

Kamilia Zaripova

Computer Aided Medical Procedures & Augmented Reality

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1862]

G. Zhang, T. Hannan, H. Kleiner, B. Aydemir, X. Xie, J. Lan, T. Seidl, V. Tresp and J. Gu.
AViLA: Asynchronous Vision-Language Agent for Streaming Multimodal Data Interaction.
Preprint (Jun. 2025). arXiv

Abstract

An ideal vision-language agent serves as a bridge between the human users and their surrounding physical world in real-world applications like autonomous driving and embodied agents, and proactively provides accurate and timely responses given user intents. An intriguing challenge arises when agents interact with the world as a dynamic data stream and ad-hoc queries from users: supporting knowledge for queries, namely evidence, usually appears asynchronously with the arrival time of queries, and agents need to ground their responses in historical data, present observations, and even future streams. We frame this challenge as Query-Evidence Asynchrony, where user queries and their supporting evidence typically arrive asynchronously in the streaming setting. This setting requires not only strong reasoning capabilities but also the ability to retain past observations and respond to queries with temporal awareness. In this paper, we introduce a diagnostic benchmark that evaluates Multimodal Large Language Models (MLLMs) on their ability to handle interaction with streaming data. Further, we present AViLA, Asynchronous Video-Language Agent for streaming data interaction that can handle ad-hoc queries and give time-aware responses. For this purpose, AViLA consists of three key modules: comprehensive memory retention, evidence identification, and evidence-grounded trigger, that are designed to maintain a general-purpose memory and respond readily and timely to queries. Our experiments show that existing models often fail to respond at appropriate times, while AViLA significantly improves both accuracy and temporal awareness. Our code and dataset will be publicly available.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining

Tanveer Hannan

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1861]

Y. Zhang, H. Gao, H. Chen, W. Li, Y. Ma and V. Tresp.
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models.
Preprint (Jun. 2025). arXiv

Abstract

Multimodal Large Language Models (MLLMs) excel in tasks like multimodal reasoning and cross-modal retrieval but face deployment challenges in real-world scenarios due to distributed multimodal data and strict privacy requirements. Federated Learning (FL) offers a solution by enabling collaborative model training without centralizing data. However, realizing FL for MLLMs presents significant challenges, including high computational demands, limited client capacity, substantial communication costs, and heterogeneous client data. Existing FL methods assume client-side deployment of full models, an assumption that breaks down for large-scale MLLMs due to their massive size and communication demands. To address these limitations, we propose FedNano, the first FL framework that centralizes the LLM on the server while introducing NanoEdge, a lightweight module for client-specific adaptation. NanoEdge employs modality-specific encoders, connectors, and trainable NanoAdapters with low-rank adaptation. This design eliminates the need to deploy LLM on clients, reducing client-side storage by 95%, and limiting communication overhead to only 0.01% of the model parameters. By transmitting only compact NanoAdapter updates, FedNano handles heterogeneous client data and resource constraints while preserving privacy. Experiments demonstrate that FedNano outperforms prior FL baselines, bridging the gap between MLLM scale and FL feasibility, and enabling scalable, decentralized multimodal AI systems.

MCML Authors

Yao Zhang

Database Systems and Data Mining

Haokun Chen

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1860]

Y. Zhang, C. Lin, S. Tang, H. Chen, S. Zhou, Y. Ma and V. Tresp.
SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence.
Preprint (Jun. 2025). arXiv GitHub

Abstract

The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated system multi-agent generation.

MCML Authors

Yao Zhang

Database Systems and Data Mining

Haokun Chen

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1859]

S. Zhao, I. Prapas, Z. Xiong, I. Karasante, I. Papoutsis, G. Camps-Valls and X. Zhu.
Causal Graph Neural Networks for Robust Wildfire Forecasting Across Geographic Shifts.
Preprint (Jun. 2025). DOI

Abstract

Machine learning has become a powerful tool for modeling the relationships between environmental factors and fire events. However, beyond the predictive performance, we argue that critical decision-making requires an understanding of fire mechanisms to improve reliability. Causality offers a promising framework for explicitly analyzing the interdependencies among factors; however, its integration into deep learning and further application in disaster management remain largely underexplored. To map the relationship between historical inputs and resulting burned areas, we proposed a causally inspired deep learning approach utilizing graph models. The graph representation is constructed through a learnable approach supervised by causal knowledge. A graph pooling layer, informed by backdoor adjustment criteria, mitigates the potential confounding effects of hidden variables on the target variable. Our experiments demonstrate that our model shows better robustness, reducing the standard deviation of the AUROC with longer forecasting horizons by 64%; and enhancing performance under geographical distribution shifts by 2 points compared with the baseline. Compared with fully connected and correlation-based graphs, the causally-informed graph proved to be more resilient to input perturbations. Additionally, our model revealed the lagged effect of Oceanic Climate Index variables on local fire events and the critical role of short-term local precipitation – indicating that Mediterranean fires are mostly drought-driven.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1858]

Y. Zhou, Y. Bi, W. Tong, W. Wang, N. Navab and Z. Jiang.
UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation.
Preprint (Jun. 2025). arXiv

Abstract

Precise anomaly detection in medical images is critical for clinical decision-making. While recent unsupervised or semi-supervised anomaly detection methods trained on large-scale normal data show promising results, they lack fine-grained differentiation, such as benign vs. malignant tumors. Additionally, ultrasound (US) imaging is highly sensitive to devices and acquisition parameter variations, creating significant domain gaps in the resulting US images. To address these challenges, we propose UltraAD, a vision-language model (VLM)-based approach that leverages few-shot US examples for generalized anomaly localization and fine-grained classification. To enhance localization performance, the image-level token of query visual prototypes is first fused with learnable text embeddings. This image-informed prompt feature is then further integrated with patch-level tokens, refining local representations for improved accuracy. For fine-grained classification, a memory bank is constructed from few-shot image samples and corresponding text descriptions that capture anatomical and abnormality-specific features. During training, the stored text embeddings remain frozen, while image features are adapted to better align with medical data. UltraAD has been extensively evaluated on three breast US datasets, outperforming state-of-the-art methods in both lesion localization and fine-grained medical classification. The code will be released upon acceptance.

MCML Authors

Yue Zhou

Computer Aided Medical Procedures & Augmented Reality

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1857]

X. Zhu, S. Chen, F. Zhang, Y. Shi and Y. Wang.
GlobalBuildingAtlas: An Open Global and Complete Dataset of Building Polygons, Heights and LoD1 3D Models.
Preprint (Jun. 2025). arXiv

Abstract

We introduce GlobalBuildingAtlas, a publicly available dataset providing global and complete coverage of building polygons, heights and Level of Detail 1 (LoD1) 3D building models. This is the first open dataset to offer high quality, consistent, and complete building data in 2D and 3D form at the individual building level on a global scale. Towards this dataset, we developed machine learning-based pipelines to derive building polygons and heights (called this http URL) from global PlanetScope satellite data, respectively. Also a quality-based fusion strategy was employed to generate higher-quality polygons (called this http URL) based on existing open building polygons, including our own derived one. With more than 2.75 billion buildings worldwide, this http URL surpasses the most comprehensive database to date by more than 1 billion buildings. this http URL offers the most detailed and accurate global 3D building height maps to date, achieving a spatial resolution of 3x3 meters-30 times finer than previous global products (90 m), enabling a high-resolution and reliable analysis of building volumes at both local and global scales. Finally, we generated a global LoD1 building model (called GBA.LoD1) from the resulting this http URL and this http URL. GBA.LoD1 represents the first complete global LoD1 building models, including 2.68 billion building instances with predicted heights, i.e., with a height completeness of more than 97%, achieving RMSEs ranging from 1.5 m to 8.9 m across different continents. With its height accuracy, comprehensive global coverage and rich spatial details, GlobalBuildingAltas offers novel insights on the status quo of global buildings, which unlocks unprecedented geospatial analysis possibilities, as showcased by a better illustration of where people live and a more comprehensive monitoring of the progress on the 11th Sustainable Development Goal of the United Nations.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Sining Chen

Data Science in Earth Observation

[1856]

D. N. Jakobi, M. Stegenwallner-Schütz, N. Hollenstein, C. Ding, R. Kaspere, A. M. Škorić, E. Pavlinusic Vilus, S. Frank, M.-L. Müller, K. M. Jensen de López, N. Kharlamov, H. B. Søndergaard Knudsen, Y. Berzak, E. Lion, I. A. S. Irina A. Sekerina, C. Acarturk, M. F. Ansari, K. Harezlak, P. Kasprowski, A. Bautista, L. Beinborn, A. Bondar, A. Boznou, L. Bradshaw, J. M. Hofmann, T. Krosness, N. B. Soliva, A. Çepani, K. Cergol, A. Došen, M. Palmovic, A. Çerpja, D. Chirino, J. Chromý, V. Demberg, I. Škrjanec, N. D. Deniz, I. Fajardo, M. Giménez-Salvador, X. Mínguez-López, M. Filip, Z. Freibergs, J. Gomes, A. Janeiro, P. Luegi, J. Veríssimo, S. Gramatikov, J. Hasenäcker, A. Haveriku, N. Kote, M. M. Kamal, H. Kędzierska, D. Klimek-Jankowska, S. Kosutar, D. G. Krakowczyk, I. Krejtz, M. Łockiewicz, K. Lõo, J. Motiejūnienė, J. A. Nasir, J. S. Krog Nedergård, A. Özkan, M. Preininger, L. Pungă, D. R. Reich, C. Tschirner, Š. Rot, A. Säuberli, J. Solé-Casals, E. Strati, I. Svoboda, E. Trandafili, S. Varlokosta, M. Vulchanova and L. A. .
MultiplEYE: Creating a multilingual eye-tracking-while-reading corpus.
ETRA 2025 - ACM Symposium on Eye Tracking Research and Applications. Tokyo, Japan, May 26-29, 2025. DOI

Abstract

Eye-tracking-while-reading data provide valuable insights across multiple disciplines, including psychology, linguistics, natural language processing, education, and human-computer interaction. Despite its potential, the availability of large, high-quality, multilingual datasets remains limited, hindering both foundational reading research and advancements in applications. The MultiplEYE project addresses this gap by establishing a large-scale, international eye-tracking data collection initiative. It aims to create a multilingual dataset of eye movements recorded during natural reading, balancing linguistic diversity, while ensuring methodological consistency for reliable cross-linguistic comparisons. The dataset spans numerous languages and follows strict procedural, documentation, and data pre-processing standards to enhance eye-tracking data transparency and reproducibility. A novel data-sharing framework, integrated with data quality reports, allows for selective data filtering based on research needs. Researchers and labs worldwide are invited to join the initiative. By establishing and promoting standardized practices and open data sharing, MultiplEYE facilitates interdisciplinary research and advances reading research and gaze-augmented applications.

MCML Authors

Andreas Säuberli

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[1855]

J. W. Grootjen, F. Prummer, M. Bâce, C. Jiao, S. Jindal and A. Bulling.
PETMEI: 10th Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction.
PETMEI @ETRA 2025 - 10th International Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction at the ACM Symposium on Eye Tracking Research and Applications (ETRA 2025). Tokyo, Japan, May 26-29, 2025. DOI

Abstract

The first applications of eye tracking and eye-based human-computer interfaces mainly concentrated on making use of the eyes in traditional desktop settings. However, this changed in the last decade with a growth of interest in smart eyewear. With recent advances in low-cost mobile eye trackers, gaze-based techniques for mobile computing have become increasingly important. PETMEI 2025 focuses on the pervasive eye tracking paradigm as a trailblazer for mobile eye-based interaction and eye-based context-awareness. We want to stimulate and explore the creativity of these communities with respect to the implications, key research challenges, and new applications for pervasive eye tracking in ubiquitous computing. The long-term goal is to create a strong interdisciplinary research community linking these fields and establish the workshop as the premier forum for research on pervasive eye tracking.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

[1854]

V. Ruozzi, S. Matinfar, L. Schütz, B. Wiestler, A. Redaelli, E. Votta and N. Navab.
BioSonix: Can Physics-based Sonification Perceptualize Tissue Deformations From Tool Interactions?
IPMI 2025 - Information Processing in Medical Imaging. Kos Island, Greece, May 25-30, 2025. To be published.

Abstract

Perceptualizing tool interactions with deformable structures in surgical procedures remains challenging, as unimodal visualization techniques often fail to capture the complexity of these interactions due
to constraints such as occlusion and limited depth perception. This paper presents a novel approach to augment tool navigation in mixed reality environments by providing auditory representations of tool-tissue dynamics, particularly for interactions with soft tissue. BioSonix, a physics-informed design framework, utilizes tissue displacements in 3D space to compute excitation forces for a sound model encoding tissue properties such as stiffness and density. Biomechanical simulations were employed to model particle displacements resulting from tool-tissue interactions, establishing a robust foundation for the method. An optimization approach was used to define configurations for capturing diverse interaction scenarios with varying tool trajectories. Experiments were conducted to validate the accuracy of the sound-displacement mappings. Additionally, two user studies were performed: the first involved two clinical professionals (a neuroradiologist and a cardiologist), who confirmed the method’s impact and achieved high task accuracy; the second included 22 biomedical experts, who demonstrated high discrimination accuracy in tissue differentiation and targeting tasks. The results revealed a strong correlation between tool-tissue dynamics and their corresponding auditory profiles, highlighting the potential of these sound representations to en-
hance the intuitive understanding of complex interactions.

MCML Authors

Sasan Matinfar

Computer Aided Medical Procedures & Augmented Reality

Laura Schütz

Computer Aided Medical Procedures & Augmented Reality

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1853]

A. H. Berger, L. Lux, A. Weers, M. Menten, D. Rückert and J. C. Paetzold.
Pitfalls of topology-aware image segmentation.
IPMI 2025 - Information Processing in Medical Imaging. Kos Island, Greece, May 25-30, 2025. To be published. Preprint available. arXiv

Abstract

Topological correctness, i.e., the preservation of structural integrity and specific characteristics of shape, is a fundamental requirement for medical imaging tasks, such as neuron or vessel segmentation. Despite the recent surge in topology-aware methods addressing this challenge, their real-world applicability is hindered by flawed benchmarking practices. In this paper, we identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts in ground truth annotations, and inappropriate use of evaluation metrics. Through detailed empirical analysis, we uncover these issues’ profound impact on the evaluation and ranking of segmentation methods. Drawing from our findings, we propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.

MCML Authors

Laurin Lux

C1 | Medicine
→ Group Martin Menten

Artificial Intelligence in Healthcare and Medicine

Alexander Weers

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1852]

F. Bongratz, Y. Li, S. Elbaroudy and C. Wachinger.
3D Shape-to-Image Brownian Bridge Diffusion for Brain MRI Synthesis from Cortical Surfaces.
IPMI 2025 - Information Processing in Medical Imaging. Kos Island, Greece, May 25-30, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Despite recent advances in medical image generation, existing methods struggle to produce anatomically plausible 3D structures. In synthetic brain magnetic resonance images (MRIs), characteristic fissures are often missing, and reconstructed cortical surfaces appear scattered rather than densely convoluted. To address this issue, we introduce Cor2Vox, the first diffusion model-based method that translates continuous cortical shape priors to synthetic brain MRIs. To achieve this, we leverage a Brownian bridge process which allows for direct structured mapping between shape contours and medical images. Specifically, we adapt the concept of the Brownian bridge diffusion model to 3D and extend it to embrace various complementary shape representations. Our experiments demonstrate significant improvements in the geometric accuracy of reconstructed structures compared to previous voxel-based approaches. Moreover, Cor2Vox excels in image quality and diversity, yielding high variation in non-target structures like the skull. Finally, we highlight the capability of our approach to simulate cortical atrophy at the sub-voxel level.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Yitong Li

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1851]

L. D. Reyes Vargas, M. Menten, J. C. Paetzold, N. Navab and M. F. Azampour.
Skelite: Compact Neural Networks for Efficient Iterative Skeletonization.
IPMI 2025 - Information Processing in Medical Imaging. Kos Island, Greece, May 25-30, 2025. To be published. Preprint available. arXiv

Abstract

Skeletonization extracts thin representations from images that compactly encode their geometry and topology. These representations have become an important topological prior for preserving connectivity in curvilinear structures, aiding medical tasks like vessel segmentation. Existing compatible skeletonization algorithms face significant trade-offs: morphology-based approaches are computationally efficient but prone to frequent breakages, while topology-preserving methods require substantial computational resources. We propose a novel framework for training iterative skeletonization algorithms with a learnable component. The framework leverages synthetic data, task-specific augmentation, and a model distillation strategy to learn compact neural networks that produce thin, connected skeletons with a fully differentiable iterative algorithm. Our method demonstrates a 100 times speedup over topology-constrained algorithms while maintaining high accuracy and generalizing effectively to new domains without fine-tuning. Benchmarking and downstream validation in 2D and 3D tasks demonstrate its computational efficiency and real-world applicability.

MCML Authors

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

[1850]

D. Huang, N. Navab and Z. Jiang.
Improving Probe Localization for Freehand 3D Ultrasound using Lightweight Cameras.
ICRA 2025 - IEEE International Conference on Robotics and Automation. Atlanta, GA, USA, May 19-23, 2025. To be published.

Abstract

Ultrasound (US) probe localization relative to the examined subject is essential for freehand 3D US imaging, which offers significant clinical value due to its affordability and unrestricted field of view. However, existing methods often rely on expensive tracking systems or bulky probes, while recent US image-based deep learning methods suffer from accumulated errors during probe maneuvering. To address these challenges, this study proposes a versatile, cost-effective probe pose localization method for freehand 3D US imaging, utilizing two lightweight cameras. To eliminate accumulated errors during US scans, we introduce PoseNet, which directly predicts the probe’s 6D pose relative to a preset world coordinate system based on camera observations. We first jointly train pose and camera image encoders based on pairs of 6D pose and camera observations densely sampled in simulation. This will encourage each pair of probe pose and its corresponding camera observation to share the same representation in latent space. To ensure the two encoders handle unseen images and poses effectively, we incorporate a triplet loss that enforces smaller differences in latent features between nearby poses compared to distant ones. Then, the pose decoder uses the latent representation of the camera images to predict the probe’s 6D pose. To bridge the sim-to-real gap, in the real world, we use the trained image encoder and pose decoder for initial predictions, followed by an additional MLP layer to refine the estimated pose, improving accuracy. The results obtained from an arm phantom demonstrate the effectiveness of the proposed method, which notably surpasses state-of-the-art techniques, achieving average positional and rotational errors of 2.03 mm and 0.37◦, respectively.

MCML Authors

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1849]

J. Meier, L. Inchingolo, O. Dhaouadi, Y. Xia, J. Kaiser and D. Cremers.
MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models.
ICRA 2025 - IEEE International Conference on Robotics and Automation. Atlanta, GA, USA, May 19-23, 2025. To be published. Preprint available. arXiv

Abstract

MCML Authors

Johannes Meier

Computer Vision & Artificial Intelligence

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Computer Vision & Artificial Intelligence

[1848]

D. Strieder.
Structure Uncertainty in Causal Inference.
Dissertation 2025. URL

Abstract

In order to draw causal conclusions from available data, it is crucial to reason about the underlying causal structure that governs the data-generating process. In this publication-based thesis, we tackle the challenge of rigorously accounting for uncertainty in this underlying causal structure in causal inference. We present a framework based on test inversions to construct calibrated confidence regions for total causal effects that capture both sources of uncertainty: causal structure and numerical size of nonzero effects.

MCML Authors

David Strieder

Mathematical Statistics

[1847]

M. Dannehl, S. Valenzuela and J. Kinder.
Which Instructions Matter the Most: A Saliency Analysis of Binary Function Embedding Models.
DLSP @SPW 2025 - 8th Deep Learning Security and Privacy Workshop co-located with the 46th IEEE Symposium on Security and Privacy (SPW 2025). San Francisco, CA, May 15, 2025. DOI

Abstract

Current deep learning models for binary code struggle with explainability, since it is often unclear which factors are important for a given output. In this paper, we apply occlusion-based saliency analysis as an explainability method to binary code embedding models. We conduct experiments on two state-of-the-art Transformer-based models that take preprocessed assembly code as input and calculate embedding vectors for each function. We show that, during training, the models learn the importance of different instructions. From the results, we observe that call instructions and the names of external call targets are important. This observation confirms the intuition that function calls significantly impact the semantics of a function and therefore should also have a large impact on its learned embedding. This motivates the need for developing model architectures that integrate stronger analysis into preprocessing to further leverage call relationships.

MCML Authors

Moritz Dannehl

Programming Languages and Artificial Intelligence

Samuel Valenzuela

Programming Languages and Artificial Intelligence

Johannes Kinder

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Programming Languages and Artificial Intelligence

[1846]

S. Ball, S. Allmendinger, F. Kreuter and N. Kühl.
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction.
AAPOR 2025 - AAPOR 80th Annual Conference on Reshaping Democracy’s Oracle: TransForming Polls, Surveys, and the Measurement of Public Opinion in the Age of Al. St. Louis, MO, USA, May 14-16, 2025. To be published. Preprint available. arXiv

Abstract

Generative AI (GenAI) is increasingly used in survey contexts to simulate human preferences. While many research endeavors evaluate the quality of synthetic GenAI data by comparing model-generated responses to gold-standard survey results, fundamental questions about the validity and reliability of using LLMs as substitutes for human respondents remain. Our study provides a technical analysis of how demographic attributes and prompt variations influence latent opinion mappings in large language models (LLMs) and evaluates their suitability for survey-based predictions. Using 14 different models, we find that LLM-generated data fails to replicate the variance observed in real-world human responses, particularly across demographic subgroups. In the political space, persona-to-party mappings exhibit limited differentiation, resulting in synthetic data that lacks the nuanced distribution of opinions found in survey data. Moreover, we show that prompt sensitivity can significantly alter outputs for some models, further undermining the stability and predictiveness of LLM-based simulations. As a key contribution, we adapt a probe-based methodology that reveals how LLMs encode political affiliations in their latent space, exposing the systematic distortions introduced by these models. Our findings highlight critical limitations in AI-generated survey data, urging caution in its use for public opinion research, social science experimentation, and computational behavioral modeling.

MCML Authors

Sarah Ball

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[1845]

O. Kononykhina.
How ML-Filtered Answer Options Shape Responses and Interactions in CATI Surveys.
AAPOR 2025 - AAPOR 80th Annual Conference on Reshaping Democracy’s Oracle: TransForming Polls, Surveys, and the Measurement of Public Opinion in the Age of Al. St. Louis, MO, USA, May 14-16, 2025. To be published. Preprint available. URL

Abstract

Occupational coding has historically been a manual, post-survey task, but tools like OccuCoDe are shifting this process into real-time surveys using machine learning (ML). OccuCoDe dynamically filters and presents tailored answer options, allowing respondents themselves to select the description that best matches their occupation. However, our study revealed low agreement between such respondent-driven ML-based coding and post-survey manual coding, prompting us to explore how the quality of responses in automatic occupational coding relates to the quality of answer options, respondent and interviewer behaviors. We embedded OccuCoDe into a standard monthly multi-topic survey conducted by the Institute for Applied Social Science (INFAS) from 1 April to 31 June 2019. The survey was designed as a cross-sectional and panel survey with a 30:70 ratio for panel and new respondents, resulting in a representative sample of adults in Germany aged 18 and older. We received and analyzed 669 audio recordings through behavioral coding. Results showed that the quality of ML-generated suggestions significantly influenced classification accuracy, with highly accurate suggestion leading to better alignment with manual coding. Contrary to expectations, behavioral factors such as interviewer adherence to scripts or respondent mapping or comprehension issues were not the significant drivers of mismatches. Instead, familiar survey dynamics persisted: respondents often interrupted when they identified an option they liked, or interviewers skipped certain categories (e.g., ‘Other’). These findings suggest that while integrating ML or other AI tools into surveys is potentially fruitful, the key to success lies in refining the precision and distinctiveness of answer options. We also demonstrate that, although both respondents and interviewers showed adaptability to the presence of an automatic component, their behaviors could not overcome mismatches caused by limitations in ML-generated suggestions. In occupational coding—and potentially other survey domains—the effectiveness of real-time ML/AI integration depends on aligning algorithmic outputs with respondent realities to achieve high-quality data.

MCML Authors

Olga Kononykhina

Social Data Science and AI

[1844]

C. Kühn and S.-V. Kuntz.
Analysis of the Geometric Structure of Neural Networks and Neural ODEs via Morse Functions.
DS 2025 - SIAM Conference on Applications of Dynamical Systems. Denver, CO, USA, May 11-15, 2025. To be published. Preprint available. arXiv

Abstract

Besides classical feed-forward neural networks, also neural ordinary differential equations (neural ODEs) have gained particular interest in recent years. Neural ODEs can be interpreted as an infinite depth limit of feed-forward or residual neural networks. We study the input-output dynamics of finite and infinite depth neural networks with scalar output. In the finite depth case, the input is a state associated with a finite number of nodes, which maps under multiple non-linear transformations to the state of one output node. In analogy, a neural ODE maps an affine linear transformation of the input to an affine linear transformation of its time-T map. We show that depending on the specific structure of the network, the input-output map has different properties regarding the existence and regularity of critical points, which can be characterized via Morse functions. We prove that critical points cannot exist if the dimension of the hidden layer is monotonically decreasing or the dimension of the phase space is smaller or equal to the input dimension. In the case that critical points exist, we classify their regularity depending on the specific architecture of the network. We show that except for a Lebesgue measure zero set in the weight space, each critical point is non-degenerate, if for finite depth neural networks the underlying graph has no bottleneck, and if for neural ODEs, the affine linear transformations used have full rank. For each type of architecture, the proven properties are comparable in the finite and the infinite depth case. The established theorems allow us to formulate results on universal embedding, i.e., on the exact representation of maps by neural networks and neural ODEs. Our dynamical systems viewpoint on the geometric structure of the input-output map provides a fundamental understanding of why certain architectures perform better than others.

MCML Authors

Christian Kühn

Prof. Dr.

A2 | Mathematical Foundations
→ Group Christian Kühn

Multiscale and Stochastic Dynamics

Sara-Viola Kuntz

Multiscale and Stochastic Dynamics

[1843]

G. Manten, C. Casolo, S. W. Mogensen and N. Kilbertus.
An Asymmetric Independence Model for Causal Discovery on Path Spaces.
CLeaR 2025 - 4th Conference on Causal Learning and Reasoning. Lausanne, Switzerland, May 07-09, 2025. To be published. Preprint available. arXiv

Abstract

We develop the theory linking ‘E-separation’ in directed mixed graphs (DMGs) with conditional independence relations among coordinate processes in stochastic differential equations (SDEs), where causal relationships are determined by ‘which variables enter the governing equation of which other variables’. We prove a global Markov property for cyclic SDEs, which naturally extends to partially observed cyclic SDEs, because our asymmetric independence model is closed under marginalization. We then characterize the class of graphs that encode the same set of independence relations, yielding a result analogous to the seminal ‘same skeleton and v-structures’ result for directed acyclic graphs (DAGs). In the fully observed case, we show that each such equivalence class of graphs has a greatest element as a parsimonious representation and develop algorithms to identify this greatest element from data. We conjecture that a greatest element also exists under partial observations, which we verify computationally for graphs with up to four nodes.

MCML Authors

Georg Manten

Ethics in Systems Design and Machine Learning

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[1842]

T. Nagler and T. Vatter.
Solving Estimating Equations With Copulas.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. DOI

Abstract

Thanks to their ability to capture complex dependence structures, copulas are frequently used to glue random variables into a joint model with arbitrary marginal distributions. More recently, they have been applied to solve statistical learning problems such as regression or classification. Framing such approaches as solutions of estimating equations, we generalize them in a unified framework. We can then obtain simultaneous, coherent inferences across multiple regression-like problems. We derive consistency, asymptotic normality, and validity of the bootstrap for corresponding estimators. The conditions allow for both continuous and discrete data as well as parametric, nonparametric, and semiparametric estimators of the copula and marginal distributions. The versatility of this methodology is illustrated by several theoretical examples, a simulation study, and an application to financial portfolio allocation. Supplementary materials for this article are available online.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Computational Statistics & Data Science

[1841]

R. Schulte and D. Rügamer.
Additive Model Boosting: New Insights and Path(ologie)s.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. Oral Presentation. To be published. Preprint available. URL

Abstract

Additive models (AMs) have sparked a lot of interest in machine learning recently, allowing the incorporation of interpretable structures into a wide range of model classes. Many commonly used approaches to fit a wide variety of potentially complex additive models build on the idea of boosting additive models. While boosted additive models (BAMs) work well in practice, certain theoretical aspects are still poorly understood, including general convergence behavior and what optimization problem is being solved when accounting for the implicit regularizing nature of boosting. In this work, we study the solution paths of BAMs and establish connections with other approaches for certain classes of problems. Along these lines, we derive novel convergence results for BAMs, which yield crucial insights into the inner workings of the method. While our results generally provide reassuring theoretical evidence for the practical use of BAMs, they also uncover some ‘pathologies’ of boosting for certain additive model classes concerning their convergence behavior that require caution in practice. We empirically validate our theoretical findings through several numerical experiments.

MCML Authors

Rickmer Schulte

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1840]

H.-H. Chou, J. Maly, C. M. Verdun, B. Freitas Paulo da Costa and H. Mirandola.
Get rid of your constraints and reparametrize: A study in NNLS and implicit bias.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. To be published. URL

Abstract

Over the past years, there has been significant interest in understanding the implicit bias of gradient descent optimization and its connection to the generalization properties of overparametrized neural networks. Several works observed that when training linear diagonal networks on the square loss for regression tasks (which corresponds to overparametrized linear regression) gradient descent converges to special solutions, e.g., non-negative ones. We connect this observation to Riemannian optimization and view overparametrized GD with identical initialization as a Riemannian GD. We use this fact for solving non-negative least squares (NNLS), an important problem behind many techniques, e.g., non-negative matrix factorization. We show that gradient flow on the reparametrized objective converges globally to NNLS solutions, providing convergence rates also for its discretized counterpart. Unlike previous methods, we do not rely on the calculation of exponential maps or geodesics. We further show accelerated convergence using a second-order ODE, lending itself to accelerated descent methods. Finally, we establish the stability against negative perturbations and discuss generalization to other constrained optimization problems.

MCML Authors

Johannes Maly

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Mathematical Data Science and Artificial Intelligence

[1839]

D. Dold, J. Kobialka, N. Palm, E. Sommer, D. Rügamer and O. Dürr.
Paths and Ambient Spaces in Neural Loss Landscapes.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. To be published. URL

Abstract

Understanding the structure of neural network loss surfaces, particularly the emergence of low-loss tunnels, is critical for advancing neural network theory and practice. In this paper, we propose a novel approach to directly embed loss tunnels into the loss landscape of neural networks. Exploring the properties of these loss tunnels offers new insights into their length and structure and sheds light on some common misconceptions. We then apply our approach to Bayesian neural networks, where we improve subspace inference by identifying pitfalls and proposing a more natural prior that better guides the sampling procedure.

MCML Authors

Julius Kobialka

Statistics, Data Science and Machine Learning

Nicolai Palm

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Computational Statistics & Data Science

Emanuel Sommer

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1838]

A. Koebler, T. Decker, I. Thon, V. Tresp and F. Buettner.
Incremental Uncertainty-aware Performance Monitoring with Active Labeling Intervention.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. To be published. URL

Abstract

We study the problem of monitoring machine learning models under gradual distribution shifts, where circumstances change slowly over time, often leading to unnoticed yet significant declines in accuracy. To address this, we propose Incremental Uncertainty-aware Performance Monitoring (IUPM), a novel label-free method that estimates performance changes by modeling gradual shifts using optimal transport. In addition, IUPM quantifies the uncertainty in the performance prediction and introduces an active labeling procedure to restore a reliable estimate under a limited labeling budget. Our experiments show that IUPM outperforms existing performance estimation baselines in various gradual shift scenarios and that its uncertainty awareness guides label acquisition more effectively compared to other strategies.

MCML Authors

Thomas Decker

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1837]

J. Marcon, P. Weinhold, M. Rzany, M. P. Fabritius, M. Winkelmann, A. Buchner, L. Eismann, J.-F. Jokisch, J. Casuscelli, G. B. Schulz, T. Knösel, M. Ingrisch, J. Ricke, C. G. Stief, S. Rodler and P. M. Kazmierczak.
Radiomics-based differentiation of upper urinary tract urothelial and renal cell carcinoma in preoperative computed tomography datasets.
BMC Medical Imaging 25.196 (May. 2025). DOI

Abstract

Background: To investigate a non-invasive radiomics-based machine learning algorithm to differentiate upper urinary tract urothelial carcinoma (UTUC) from renal cell carcinoma (RCC) prior to surgical intervention.
Methods: Preoperative computed tomography venous-phase datasets from patients that underwent procedures for histopathologically confirmed UTUC or RCC were retrospectively analyzed. Tumor segmentation was performed manually, and radiomic features were extracted according to the International Image Biomarker Standardization Initiative. Features were normalized using z-scores, and a predictive model was developed using the least absolute shrinkage and selection operator (LASSO). The dataset was split into a training cohort (70%) and a test cohort (30%).
Results: A total of 236 patients [30.5% female, median age 70.5 years (IQR: 59.5–77), median tumor size 5.8 cm (range: 4.1–8.2 cm)] were included. For differentiating UTUC from RCC, the model achieved a sensitivity of 88.4% and specificity of 81% (AUC: 0.93, radiomics score cutoff: 0.467) in the training cohort. In the validation cohort, the sensitivity was 80.6% and specificity 80% (AUC: 0.87, radiomics score cutoff: 0.601). Subgroup analysis of the validation cohort demonstrated robust performance, particularly in distinguishing clear cell RCC from high-grade UTUC (sensitivity: 84%, specificity: 73.1%, AUC: 0.84) and high-grade from low-grade UTUC (sensitivity: 57.7%, specificity: 88.9%, AUC: 0.68). Limitations include the need for independent validation in future randomized controlled trials (RCTs).
Conclusions: Machine learning-based radiomics models can reliably differentiate between RCC and UTUC in preoperative CT imaging. With a suggested performance benefit compared to conventional imaging, this technology might be added to the current preoperative diagnostic workflow.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[1836]

J. Baumsteiger, L. Celiberti, P. Rinke, M. Todorović and C. Franchini.
Exploring Noncollinear Magnetic Energy Landscapes with Bayesian Optimization.
Digital Discovery 4.6 (May. 2025). DOI

Abstract

The investigation of magnetic energy landscapes and the search for ground states of magnetic materials using ab initio methods like density functional theory (DFT) is a challenging task. Complex interactions, such as superexchange and spin-orbit coupling, make these calculations computationally expensive and often lead to non-trivial energy landscapes. Consequently, a comprehensive and systematic investigation of large magnetic configuration spaces is often impractical. We approach this problem by utilizing Bayesian Optimization, an active machine learning scheme that has proven to be efficient in modeling unknown functions and finding global minima. Using this approach we can obtain the magnetic contribution to the energy as a function of one or more spin canting angles with relatively small numbers of DFT calculations. To assess the capabilities and the efficiency of the approach we investigate the noncollinear magnetic energy landscapes of selected materials containing 3d, 5d and 5f magnetic ions: Ba3MnNb2O9, LaMn2Si2, β-MnO2, Sr2IrO4, UO2 and Ba2NaOsO6. By comparing our results to previous ab initio studies that followed more conventional approaches, we observe significant improvements in efficiency.

MCML Authors

Patrick Rinke

Prof. Dr.

AI-based Material Science

[1835]

L. Mamede, R. C. Sabàb, S. Van Coillie, J. Prevot, S. Sánchez-Ramón, C. Poli, A. Barasa, B. W. Schuller, A. Hendel, N. Garcelon, C. Boersma, P. Lee, C. Booth, L. D. Notarangelo, J. Drabwell, N. L. Rider, F. Staal, S. O. Burns, M. van Hagen, M. Pergrnt, J. G. Rivière and N. Mahlaoui.
Navigating disruption in the PID landscape: embracing opportunities and anticipating threats in the next ten years.
Frontiers in Immunology 16 (May. 2025). DOI

Abstract

The International Patient Organisation for Primary Immunodeficiencies (IPOPI) held its third edition of the Global Multi-Stakeholders’ Summit, gathering key primary immunodeficiencies (PID) stakeholders and experts to discuss and foment global collaboration. This edition focused on the impact of genomic medicine in PID treatment, the role of digital health, including artificial intelligence, in PID care, and how to anticipate and minimise risks to ensure optimal patient access to care. These discussions aimed to examine current hurdles and brainstorm feasible solutions and priorities for the PID community in these areas in the next ten years. These discussions led to recommendations for comprehensive approaches to care and access to treatment for PID patients, suggesting actions that will bring the community closer to treatments based on real-world evidence and adjusted to patient’s needs. To accomplish this, collaboration between academia, industry, regulatory authorities, and patients is crucial.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1834]

C. Schweden, K. Hechinger, G. Kauermann and X. Zhu.
Can Uncertainty Quantification Benefit From Label Embeddings? A Case Study on Local Climate Zone Classification.
IEEE Transactions on Geoscience and Remote Sensing 63 (May. 2025). DOI

Abstract

Modern deep learning models have achieved superior performance in almost all fields of remote sensing. An often neglected aspect of these models is the quantification and evaluation of predictive uncertainties. Regarding a classification task, this means that the focus of the analysis solely lies on performance metrics such as accuracy or the loss. On the other hand, a notion of uncertainty indicates the model’s indecisiveness among the given classes and is essential to understand where the model struggles to classify the data samples. In this work, three levels of uncertainty are distinguished, starting with the typical softmax pseudo-probabilities as level-1 uncertainty. As a next level, the more flexible Dirichlet framework is utilized as model output space, and hereby also, a Bayesian setting with an uninformative prior is considered. For the level-3 uncertainty, an empirical Bayes setting is incorporated where a latent embedding of the label space is iteratively estimated by the marginal likelihood of the fully parameterized label space (see [1]). The estimated embeddings are then learned by the network in three different settings: Two regression losses use the embeddings directly, while the closed-form solution of the Kullback-Leibler (KL-) Divergence uses the embedding parameterized as a Dirichlet distribution. To assess the different levels of uncertainty, the label evaluation subset of the So2Sat LCZ42 dataset, which contains label votes from multiple remote sensing experts, is investigated. The predictive uncertainties are evaluated by means of Out-of-Distribution (OoD) detection and calibration performance. Overall, the embedding-based approaches show strong performance for calibration, while for the OoD experiments, the Bayesian Dirichlet setting with an uninformative prior achieves the best performance. In conclusion, embedded labels offer a flexible framework for incorporating uncertain or ambiguous labels into a supervised training setup. They could be highly beneficial for applications in fields such as urban planning or disaster response.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1833]

S. Wang, N. A. A. Braham and X. Zhu.
Weak-strong Graph Contrastive Learning Neural Network for Hyperspectral Image Classification.
IEEE Transactions on Geoscience and Remote Sensing Early Access (May. 2025). DOI GitHub

Abstract

Deep learning methods have shown promising results in various hyperspectral image (HSI) analysis tasks. Despite these advancements, existing models still struggle to accurately identify fine-classified land cover types on noisy hyperspectral images. Traditional methods have limited performance when extracting features from noisy hyperspectral data. Graph Neural Networks (GNNs) offer an adaptable and robust structure by effectively extracting both spectral and spatial features. However, supervised models still require large quantities of labeled data for effective training, posing a significant challenge. Contrastive learning, which leverages unlabeled data for pre-training, can mitigate this issue by reducing the dependency on extensive manual annotation. To address the issues, we propose WSGraphCL, a weak-strong graph contrastive learning model for HSI classification, and conduct experiments in a few-shot scenario. First, the image is transformed into K-hop subgraphs through a spectral-spatial adjacency matrix construction method. Second, WSGraphCL leverages contrastive learning to pre-train a graph-based encoder on the unlabeled hyperspectral image. We demonstrate that weak-strong augmentations and false negative pairs filtering stabilize pre-training and get good-quality representations. Finally, we test our model with a lightweight classifier on the features with a handful of labels. Experimental results showcase the superior performance of WSGraphCL compared to several baseline models, thereby emphasizing its efficacy in addressing the identified limitations in HSI classification.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1832]

S. Zhao, Z. Xiong and X. Zhu.
RainScaler: A Physics-inspired Network for Precipitation Correction and Downscaling.
IEEE Transactions on Geoscience and Remote Sensing Early Access (May. 2025). DOI GitHub

Abstract

Spatial downscaling of precipitation, in which finegrained regional precipitation patterns are recovered from coarse-resolution images, plays a crucial role in various weather and meteorological analyses. However, the intricate noise information presented in the observation data intertwines with the fine-scale characteristics, which poses challenges for subsequent feature extraction. Regional precipitation suffers from complex spatial patterns. Moreover, the real observatory data contains information inconsistent with the established physical principle, due either to inaccurate or incomplete physical models or limited data quality, thus making the implementation of physicallyinformed deep learning more difficult. For example, strong physical constraints may lead to over-regularization, in which the model becomes too rigid and fails to capture certain complexities in the data. In this work, we propose RainScaler, a physicsinspired deep neural network, to tackle these issues. First, to remove the noise and preserve the vital precipitation patterns effectively, the proposed RainScaler exploits an Inconsistencyaware Denoising Net to explicitly model the spatial variability of noise in the input. In addition, a graph module is designed to learn the geographical-dependent fine-grained patterns in high dimensional feature space at a moderate computation cost. Finally, multi-scale physical constraints are skillfully embedded to incorporate additional insights into the data-driven framework. We test our approach on a public dataset consisting of over 60,000 real low-resolution and high-resolution precipitation map pairs collected by different sensors. Our method produces realisticlooking precipitation maps with better discernment capability and corrects the structural error of precipitation distribution, especially for extreme events. Moreover, we evaluate the potential risks of incorporating physical constraints in real-world data applications. Our method unveils opportunities for multi-source data fusion and provides possible solutions to improve the physical feasibility of data-driven models.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1831]

Q. Li, L. Mou, Y. Shi and X. Zhu.
BANet: A bilateral attention network for extracting changed buildings between remote sensing imagery and cadastral maps.
International Journal of Applied Earth Observation and Geoinformation 139.104486 (May. 2025). DOI

Abstract

Up-to-date cadastral maps are vital to local governments in administrating real estate in cities. With its growing availability, remote sensing imagery is the cost-effective data for updating semantic contents on cadastral maps. In this study, we address the problem of updating buildings on cadastral maps, as city renewal is mainly characterized by new construction and demolition. While previous works focus on extracting all buildings from remote sensing images, we argue that these methods not only disregard preliminary information on cadastral maps but also fail to preserve building priors in unchanged areas on cadastral maps. Therefore, we focus on the task of extracting changed buildings (i.e., newly built and demolished buildings) from remote sensing images and cadastral maps. To address this task, we create an image-map building change detection (IMBCD) dataset, formed by around 27K pairs of remote sensing images and maps and their corresponding changed buildings in six distinct geographical areas across the globe. Accordingly, we propose a Bilateral Attention Network (BANet), introducing a novel attention mechanism: changed-first (CF) attention and non-changed-first (NCF) attention. This bilateral attention mechanism helps to refine the uncertain areas between changed and non-changed regions. Extensive experiments on our IMBCD dataset showcase the superior performance of BANet. Specifically, our BANet outperforms state-of-the-art models with F1 scores of 90.00% and 63.00% for the IMBCD-WHU and IMBCD-Inria datasets. This confirms that the leverage of bilateral attention blocks (BAB) can boost performance.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1830]

Y. Mu, J. Guo, M. Shahzad and X. Zhu.
National-scale tree species mapping with deep learning reveals forest management insights in Germany.
International Journal of Applied Earth Observation and Geoinformation 139.104522 (May. 2025). DOI

Abstract

Accurate tree species distribution is essential for biodiversity assessment, sustainable forest management, and environmental policy. However, mapping species over large areas with satellite data is challenging due to spectral mixing and complex spatial distribution. To address this, we developed a novel deep learning model, ForestFormer, using Sentinel-2 time series data to map eight dominant tree species in Germany. ForestFormer’s dual-branch network with spectral and spatial attention modules improves classification by highlighting species-specific characteristics. Cross-validation in 2,364 National Forest Inventory plots shows that ForestFormer achieves species classification accuracy ranging from 69% to 92%, with an average accuracy of 84%, outperforming existing baseline methods. The developed ForestFormer model can help generate a large-scale and reliable tree species map for Germany, which in turn provides crucial insights into the diverse characteristics of tree species to support forest management. Our analysis of results shows that Pine is the species most resistant to disturbances, while Douglas fir is the least. Northeastern regions of Germany exhibit particularly low levels of forest biodiversity, especially in the states of Brandenburg and Berlin, followed by neighboring states such as Sachsen-Anhalt, Mecklenburg-Vorpommern, Sachsen, and Niedersachsen. In addition, climatic factors, especially water deficit, are shown to play a very important role in determining tree species distribution patterns, followed by topographic and soil factors. These findings are anticipated to provide a critical basis for environmental policy formulation, particularly in forest management strategies responding to ongoing climate change.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Data Science in Earth Observation

[1829]

H. Boche, V. Fojtik, A. Fono and G. Kutyniok.
Computability of Classification and Deep Learning: From Theoretical Limits to Practical Feasibility through Quantization.
Journal of Fourier Analysis and Applications 31.35 (May. 2025). DOI

Abstract

The unwavering success of deep learning in the past decade led to the increasing prevalence of deep learning methods in various application fields. However, the downsides of deep learning, most prominently its lack of trustworthiness, may not be compatible with safety-critical or high-responsibility applications requiring stricter performance guarantees. Recently, several instances of deep learning applications have been shown to be subject to theoretical limitations of computability, undermining the feasibility of performance guarantees when employed on real-world computers. We extend the findings by studying computability in the deep learning framework from two perspectives: From an application viewpoint in the context of classification problems and a general limitation viewpoint in the context of training neural networks. In particular, we show restrictions on the algorithmic solvability of classification problems that also render the algorithmic detection of failure in computations in a general setting infeasible. Subsequently, we prove algorithmic limitations in training deep neural networks even in cases where the underlying problem is well-behaved. Finally, we end with a positive observation, showing that in quantized versions of classification and deep network training, computability restrictions do not arise or can be overcome to a certain degree.

MCML Authors

Vit Fojtik

Mathematical Foundations of Artificial Intelligence

Adalbert Fono

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1828]

C. Kern, U. Fischer-Abaigar, J. Schweisthal, D. Frauen, R. Ghani, S. Feuerriegel, M. van der Schaar and F. Kreuter.
Algorithms for reliable decision-making need causal reasoning.
Nature Computational Science 5 (May. 2025). DOI

Abstract

Decision-making inherently involves cause–effect relationships that introduce causal challenges. We argue that reliable algorithms for decision-making need to build upon causal reasoning. Addressing these causal challenges requires explicit assumptions about the underlying causal structure to ensure identifiability and estimatability, which means that the computational methods must successfully align with decision-making objectives in real-world tasks. Algorithmic decision-making (ADM) has become common in a wide range of domains, including precision medicine, manufacturing, education, hiring, the public sector, and smart cities. At the core of ADM systems are data-driven models that learn from data to recommend decisions, often with the goal of maximizing a defined utility function1. For example, in smart city contexts, ADM is frequently used to optimize traffic flow through predictive models that analyze real-time data, thereby reducing congestion and improving urban mobility. Another prominent application area for ADM are normative decision support systems (often subsumed under ‘prescriptive analytics’) or, more recently, artificial intelligence (AI) agents that either inform or automatically execute managerial and operational decisions in industry. Yet, the applications of ADM to high-stakes decisions face safety and reliability issues1,2,3. Often, the objectives of ADM systems fail to align with the nuanced goals of real-world decision-making, thus creating a tension between the potential of ADM and the risk of harm and failure. Especially when deployed in dynamic, real-world environments, ADM can amplify systemic disadvantages for vulnerable communities and lead to flawed decisions. In this Comment, we argue that reliable algorithmic decision-making — systems that perform safely and robustly under deployment conditions — must be grounded in causal reasoning.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI Lab

Jonas Schweisthal

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[1827]

H. Homm, J. Laakso and P. Rinke.
Efficient dataset generation for machine learning halide perovskite alloys.
Physical Review Materials 9.053802 (May. 2025). DOI

Abstract

Lead-based perovskite solar cells have reached high efficiencies, but toxicity and lack of stability hinder their wide-scale adoption. These issues have been partially addressed through compositional engineering of perovskite materials, but the vast complexity of the perovskite materials space poses a significant obstacle to exploration. We previously demonstrated how machine learning (ML) can accelerate property predictions for the CsPb⁢(Cl/Br)3 perovskite alloy. However, the substantial computational demand of density functional theory (DFT) calculations required for model training prevents applications to more complex materials. Here, we introduce a data-efficient scheme to facilitate model training, validated initially on CsPb⁢(Cl/Br)3 data and extended to the ternary alloy CsSn⁢(Cl/Br/I)3. Our approach employs clustering to construct a compact yet diverse initial dataset of atomic structures. We then apply a two-stage active learning approach to first improve the reliability of the ML-based structure relaxations and then refine accuracy near equilibrium structures. Tests for CsPb⁢(Cl/Br)3 demonstrate that our scheme reduces the number of required DFT calculations during the different parts of our proposed model training method by up to 20% and 50%. The fitted model for CsSn⁢(Cl/Br/I)3 is robust and highly accurate, evidenced by the convergence of all ML-based structure relaxations in our tests and an average relaxation error of only 0.5 meV/atom.

MCML Authors

Patrick Rinke

Prof. Dr.

AI-based Material Science

[1826]

H. Löwe, C. A. Scholbeck, C. Heumann, B. Bischl and G. Casalicchio.
fmeffects: An R Package for Forward Marginal Effects.
The R Journal 16.3 (May. 2025). DOI

Abstract

Forward marginal effects have recently been introduced as a versatile and effective model-agnostic interpretation method particularly suited for non-linear and non-parametric prediction models. They provide comprehensible model explanations of the form: if we change feature values by a pre-specified step size, what is the change in the predicted outcome? We present the R package fmeffects, the first software implementation of the theory surrounding forward marginal effects. The relevant theoretical background, package functionality and handling, as well as the software design and options for future extensions are discussed in this paper.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[1825]

J. O. Alabi, M. A. Hedderich, D. I. Adelani and D. Klakow.
Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead.
Preprint (May. 2025). arXiv

Abstract

With over 2,000 languages and potentially millions of speakers, Africa represents one of the richest linguistic regions in the world. Yet, this diversity is scarcely reflected in state-of-the-art natural language processing (NLP) systems and large language models (LLMs), which predominantly support a narrow set of high-resource languages. This exclusion not only limits the reach and utility of modern NLP technologies but also risks widening the digital divide across linguistic communities. Nevertheless, NLP research on African languages is active and growing. In recent years, there has been a surge of interest in this area, driven by several factors-including the creation of multilingual language resources, the rise of community-led initiatives, and increased support through funding programs. In this survey, we analyze 734 research papers on NLP for African languages published over the past five years, offering a comprehensive overview of recent progress across core tasks. We identify key trends shaping the field and conclude by outlining promising directions to foster more inclusive and sustainable NLP research for African languages.

MCML Authors

Michael Hedderich

Dr.

AI and Computational Linguistics

[1824]

M. Arpogaus, T. Kneib, T. Nagler and D. Rügamer.
Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals.
Preprint (May. 2025). arXiv

Abstract

Density regression models allow a comprehensive understanding of data by modeling the complete conditional probability distribution. While flexible estimation approaches such as normalizing flows (NF) work particularly well in multiple dimensions, interpreting the input-output relationship of such models is often difficult, due to the black-box character of deep learning models. In contrast, existing statistical methods for multivariate outcomes such as multivariate conditional transformation models (MCTM) are restricted in flexibility and are often not expressive enough to represent complex multivariate probability distributions. In this paper, we combine MCTM with state-of-the-art and autoregressive NF to leverage the transparency of MCTM for modeling interpretable feature effects on the marginal distributions in the first step and the flexibility of neural-network-based NF techniques to account for complex and non-linear relationships in the joint data distribution. We demonstrate our method’s versatility in various numerical experiments and compare it with MCTM and other NF models on both simulated and real-world data.

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1823]

R. L. Bach and C. Kern.
Fairness, Justice, and Social Inequality in Machine Learning.
Preprint (May. 2025). DOI

Abstract

As machine learning (ML) systems increasingly shape decision-making across crucial societal domains, the discourse around fairness in algorithmic systems (fairML) has intensified. Although fairML research is rapidly expanding, contributions from social science, particularly sociology, remain limited. This chapter aims to address this gap by examining fairness in ML through a sociological lens, focusing on the interplay between algorithmic decision-making and social inequality. We argue that fairML frameworks must explicitly distinguish technical fairness—focused on unbiased predictions—from normative justice, which addresses broader ethical and distributive considerations. We identify and discuss five key challenges confronting fairML today: (1) clearly separating fairness and justice, (2) developing more sophisticated measures of vulnerability and protected attributes, (3) incorporating historical disadvantage and social origin into fairness evaluations, (4) assessing unintended social consequences of algorithmic interventions, and (5) empirically investigating stakeholder preferences toward AI systems. By highlighting these sociologically informed challenges, this chapter advocates for a more holistic, context-sensitive approach to algorithmic fairness. Ultimately, our analysis proposes a sociologically grounded research agenda aimed at critically assessing and enhancing the role of fairML in either perpetuating or alleviating social inequalities.

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[1822]

J. Bi, D. Yan, Y. Wang, W. Huang, H. Chen, G. Wan, M. Ye, X. Xiao, H. Schütze, V. Tresp and Y. Ma.
CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process.
Preprint (May. 2025). arXiv

Abstract

Recent Large Reasoning Models significantly improve the reasoning ability of Large Language Models by learning to reason, exhibiting the promising performance in solving complex tasks. LRMs solve tasks that require complex reasoning by explicitly generating reasoning trajectories together with answers. Nevertheless, judging the quality of such an output answer is not easy because only considering the correctness of the answer is not enough and the soundness of the reasoning trajectory part matters as well. Logically, if the soundness of the reasoning part is poor, even if the answer is correct, the confidence of the derived answer should be low. Existing methods did consider jointly assessing the overall output answer by taking into account the reasoning part, however, their capability is still not satisfactory as the causal relationship of the reasoning to the concluded answer cannot properly reflected. In this paper, inspired by classical mechanics, we present a novel approach towards establishing a CoT-Kinetics energy equation. Specifically, our CoT-Kinetics energy equation formulates the token state transformation process, which is regulated by LRM internal transformer layers, as like a particle kinetics dynamics governed in a mechanical field. Our CoT-Kinetics energy assigns a scalar score to evaluate specifically the soundness of the reasoning phase, telling how confident the derived answer could be given the evaluated reasoning. As such, the LRM’s overall output quality can be accurately measured, rather than a coarse judgment (e.g., correct or incorrect) anymore.

MCML Authors

Haokun Chen

Database Systems and Data Mining

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

[1821]

N. Broestl, B. Lange, C. Voinea, G. Keeling and R. Lam.
Evaluating Intra-firm LLM Alignment Strategies in Business Contexts.
Preprint (May. 2025). arXiv

Abstract

Instruction-tuned Large Language Models (LLMs) are increasingly deployed as AI Assistants in firms for support in cognitive tasks. These AI assistants carry embedded perspectives which influence factors across the firm including decision-making, collaboration, and organizational culture. This paper argues that firms must align the perspectives of these AI Assistants intentionally with their objectives and values, framing alignment as a strategic and ethical imperative crucial for maintaining control over firm culture and intra-firm moral norms. The paper highlights how AI perspectives arise from biases in training data and the fine-tuning objectives of developers, and discusses their impact and ethical significance, foregrounding ethical concerns like automation bias and reduced critical thinking. Drawing on normative business ethics, particularly non-reductionist views of professional relationships, three distinct alignment strategies are proposed: supportive (reinforcing the firm’s mission), adversarial (stress-testing ideas), and diverse (broadening moral horizons by incorporating multiple stakeholder views). The ethical trade-offs of each strategy and their implications for manager-employee and employee-employee relationships are analyzed, alongside the potential to shape the culture and moral fabric of the firm.

MCML Authors

Benjamin Lange

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Ethics of Artificial Intelligence

[1820]

B. Chen, Y. Liu, A. Korhonen and B. Plank.
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation.
Preprint (May. 2025). arXiv

Abstract

The recent rise of reasoning-tuned Large Language Models (LLMs)–which generate chains of thought (CoTs) before giving the final answer–has attracted significant attention and offers new opportunities for gaining insights into human label variation, which refers to plausible differences in how multiple annotators label the same data instance. Prior work has shown that LLM-generated explanations can help align model predictions with human label distributions, but typically adopt a reverse paradigm: producing explanations based on given answers. In contrast, CoTs provide a forward reasoning path that may implicitly embed rationales for each answer option, before generating the answers. We thus propose a novel LLM-based pipeline enriched with linguistically-grounded discourse segmenters to extract supporting and opposing statements for each answer option from CoTs with improved accuracy. We also propose a rank-based HLV evaluation framework that prioritizes the ranking of answers over exact scores, which instead favor direct comparison of label distributions. Our method outperforms a direct generation method as well as baselines on three datasets, and shows better alignment of ranking methods with humans, highlighting the effectiveness of our approach.

MCML Authors

Beiduo Chen

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1819]

H. Chen, Y. Zhang, Y. Bi, Y. Zhang, T. Liu, J. Bi, J. Lan, J. Gu, C. Grosser, D. Krompass, N. Navab and V. Tresp.
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs.
Preprint (May. 2025). arXiv

Abstract

In recent years, Large Language Models (LLMs) have achieved remarkable advancements, drawing significant attention from the research community. Their capabilities are largely attributed to large-scale architectures, which require extensive training on massive datasets. However, such datasets often contain sensitive or copyrighted content sourced from the public internet, raising concerns about data privacy and ownership. Regulatory frameworks, such as the General Data Protection Regulation (GDPR), grant individuals the right to request the removal of such sensitive information. This has motivated the development of machine unlearning algorithms that aim to remove specific knowledge from models without the need for costly retraining. Despite these advancements, evaluating the efficacy of unlearning algorithms remains a challenge due to the inherent complexity and generative nature of LLMs. In this work, we introduce a comprehensive auditing framework for unlearning evaluation, comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods. By using various auditing algorithms, we evaluate the effectiveness and robustness of different unlearning strategies. To explore alternatives beyond prompt-based auditing, we propose a novel technique that leverages intermediate activation perturbations, addressing the limitations of auditing methods that rely solely on model inputs and outputs.

MCML Authors

Haokun Chen

Database Systems and Data Mining

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Yao Zhang

Database Systems and Data Mining

Tong Liu

Database Systems and Data Mining

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1818]

N. De La Fuente, M. Pilligua, D. Vidal, A. Soutiff, C. Curreli, D. Cremers and A. Barsky.
Prototype Augmented Hypernetworks for Continual Learning.
Preprint (May. 2025). arXiv

Abstract

Continual learning (CL) aims to learn a sequence of tasks without forgetting prior knowledge, but gradient updates for a new task often overwrite the weights learned earlier, causing catastrophic forgetting (CF). We propose Prototype-Augmented Hypernetworks (PAH), a framework where a single hypernetwork, conditioned on learnable task prototypes, dynamically generates task-specific classifier heads on demand. To mitigate forgetting, PAH combines cross-entropy with dual distillation losses, one to align logits and another to align prototypes, ensuring stable feature representations across tasks. Evaluations on Split-CIFAR100 and TinyImageNet demonstrate that PAH achieves state-of-the-art performance, reaching 74.5% and 63.7% accuracy with only 1.7% and 4.4% forgetting, respectively, surpassing prior methods without storing samples or heads.

MCML Authors

Cecilia Curreli

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Computer Vision & Artificial Intelligence

[1817]

D. Dementieva, N. Babakov and A. Fraser.
EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian.
Preprint (May. 2025). arXiv

Abstract

While Ukrainian NLP has seen progress in many texts processing tasks, emotion classification remains an underexplored area with no publicly available benchmark to date. In this work, we introduce EmoBench-UA, the first annotated dataset for emotion detection in Ukrainian texts. Our annotation schema is adapted from the previous English-centric works on emotion detection (Mohammad et al., 2018; Mohammad, 2022) guidelines. The dataset was created through crowdsourcing using the this http URL platform ensuring high-quality of the annotation process. Then, we evaluate a range of approaches on the collected dataset, starting from linguistic-based baselines, synthetic data translated from English, to large language models (LLMs). Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian and emphasize the need for further development of Ukrainian-specific models and training resources.

MCML Authors

Daryna Dementieva

Dr.

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Michael Hedderich

Data Analytics & Statistics

[1816]

F. Eichin, Y. Du, P. Mondorf, B. Plank and M. A. Hedderich.
Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior.
Preprint (May. 2025). arXiv GitHub

Abstract

Post-hoc interpretability methods typically attribute a model’s behavior to its components, data, or training trajectory in isolation. This leads to explanations that lack a unified view and may miss key interactions. While combining existing methods or applying them at different training stages offers broader insights, these approaches usually lack theoretical support. In this work, we present ExPLAIND, a unified framework that integrates all three perspectives. First, we generalize recent work on gradient path kernels, which reformulate models trained by gradient descent as a kernel machine, to more realistic training settings. Empirically, we find that both a CNN and a Transformer model are replicated accurately by this reformulation. Second, we derive novel parameter- and step-wise influence scores from the kernel feature maps. We show their effectiveness in parameter pruning that is comparable to existing methods, reinforcing their value for model component attribution. Finally, jointly interpreting model components and data over the training process, we leverage ExPLAIND to analyze a Transformer that exhibits Grokking. Among other things, our findings support previously proposed stages of Grokking, while refining the final phase as one of alignment of input embeddings and final layers around a representation pipeline learned after the memorization phase. Overall, ExPLAIND provides a theoretically grounded, unified framework to interpret model behavior and training dynamics.

MCML Authors

Florian Eichin

AI and Computational Linguistics

Philipp Mondorf

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Michael Hedderich

Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

AI and Computational Linguistics

[1815]

V. Fojtik, M. Matveev, H.-H. Chou, G. Kutyniok and J. Maly.
Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization.
Preprint (May. 2025). arXiv

Abstract

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this theoretically, recent works examine gradient descent and its variants in simplified training settings, often assuming vanishing learning rates. These studies reveal various forms of implicit regularization, such as ℓ1-norm minimizing parameters in regression and max-margin solutions in classification. Concurrently, empirical findings show that moderate to large learning rates exceeding standard stability thresholds lead to faster, albeit oscillatory, convergence in the so-called Edge-of-Stability regime, and induce an implicit bias towards minima of low sharpness (norm of training loss Hessian). In this work, we argue that a comprehensive understanding of the generalization performance of gradient descent requires analyzing the interaction between these various forms of implicit regularization. We empirically demonstrate that the learning rate balances between low parameter norm and low sharpness of the trained model. We furthermore prove for diagonal linear networks trained on a simple regression task that neither implicit bias alone minimizes the generalization error. These findings demonstrate that focusing on a single implicit bias is insufficient to explain good generalization, and they motivate a broader view of implicit regularization that captures the dynamic trade-off between norm and sharpness induced by non-negligible learning rates.

MCML Authors

Vit Fojtik

Mathematical Foundations of Artificial Intelligence

Maria Matveev

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

Johannes Maly

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Mathematical Data Science and Artificial Intelligence

[1814]

D. Frauen, V. Melnychuk, J. Schweisthal, M. van der Schaar and S. Feuerriegel.
Treatment Effect Estimation for Optimal Decision-Making.
Preprint (May. 2025). arXiv

Abstract

Decision-making across various fields, such as medicine, heavily relies on conditional average treatment effects (CATEs). Practitioners commonly make decisions by checking whether the estimated CATE is positive, even though the decision-making performance of modern CATE estimators is poorly understood from a theoretical perspective. In this paper, we study optimal decision-making based on two-stage CATE estimators (e.g., DR-learner), which are considered state-of-the-art and widely used in practice. We prove that, while such estimators may be optimal for estimating CATE, they can be suboptimal when used for decision-making. Intuitively, this occurs because such estimators prioritize CATE accuracy in regions far away from the decision boundary, which is ultimately irrelevant to decision-making. As a remedy, we propose a novel two-stage learning objective that retargets the CATE to balance CATE estimation error and decision performance. We then propose a neural method that optimizes an adaptively-smoothed approximation of our learning objective. Finally, we confirm the effectiveness of our method both empirically and theoretically. In sum, our work is the first to show how two-stage CATE estimators can be adapted for optimal decision-making.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[1813]

D. Frauen, M. Schröder, K. Hess and S. Feuerriegel.
Orthogonal Survival Learners for Estimating Heterogeneous Treatment Effects from Time-to-Event Data.
Preprint (May. 2025). arXiv

Abstract

Estimating heterogeneous treatment effects (HTEs) is crucial for personalized decision-making. However, this task is challenging in survival analysis, which includes time-to-event data with censored outcomes (e.g., due to study dropout). In this paper, we propose a toolbox of novel orthogonal survival learners to estimate HTEs from time-to-event data under censoring. Our learners have three main advantages: (i) we show that learners from our toolbox are guaranteed to be orthogonal and thus come with favorable theoretical properties; (ii) our toolbox allows for incorporating a custom weighting function, which can lead to robustness against different types of low overlap, and (iii) our learners are model-agnostic (i.e., they can be combined with arbitrary machine learning models). We instantiate the learners from our toolbox using several weighting functions and, as a result, propose various neural orthogonal survival learners. Some of these coincide with existing survival learners (including survival versions of the DR- and R-learner), while others are novel and further robust w.r.t. low overlap regimes specific to the survival setting (i.e., survival overlap and censoring overlap). We then empirically verify the effectiveness of our learners for HTE estimation in different low-overlap regimes through numerical experiments. In sum, we provide practitioners with a large toolbox of learners that can be used for randomized and observational studies with censored time-to-event data.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Maresa Schröder

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence in Management

[1812]

S. Gerstner and H. Schütze.
Understanding Gated Neurons in Transformers from Their Input-Output Functionality.
Preprint (May. 2025). arXiv

Abstract

Interpretability researchers have attempted to understand MLP neurons of language models based on both the contexts in which they activate and their output weight vectors. They have paid little attention to a complementary aspect: the interactions between input and output. For example, when neurons detect a direction in the input, they might add much the same direction to the residual stream (’enrichment neurons’) or reduce its presence (‘depletion neurons’). We address this aspect by examining the cosine similarity between input and output weights of a neuron. We apply our method to 12 models and find that enrichment neurons dominate in early-middle layers whereas later layers tend more towards depletion. To explain this finding, we argue that enrichment neurons are largely responsible for enriching concept representations, one of the first steps of factual recall. Our input-output perspective is a complement to activation-dependent analyses and to approaches that treat input and output separately.

MCML Authors

Sebastian Gerstner

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Computational Linguistics

[1811]

F. Ghorbanpour, D. Dementieva and A. Fraser.
Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study.
Preprint (May. 2025). arXiv

Abstract

Despite growing interest in automated hate speech detection, most existing approaches overlook the linguistic diversity of online content. Multilingual instruction-tuned large language models such as LLaMA, Aya, Qwen, and BloomZ offer promising capabilities across languages, but their effectiveness in identifying hate speech through zero-shot and few-shot prompting remains underexplored. This work evaluates LLM prompting-based detection across eight non-English languages, utilizing several prompting techniques and comparing them to fine-tuned encoder models. We show that while zero-shot and few-shot prompting lag behind fine-tuned encoder models on most of the real-world evaluation sets, they achieve better generalization on functional tests for hate speech detection. Our study also reveals that prompt design plays a critical role, with each language often requiring customized prompting techniques to maximize performance.

MCML Authors

Faeze Ghorbanpour

Data Analytics & Statistics

Daryna Dementieva

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

[1810]

F. Ghorbanpour, D. Dementieva and A. Fraser.
Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data.
Preprint (May. 2025). arXiv

Abstract

Considering the importance of detecting hateful language, labeled hate speech data is expensive and time-consuming to collect, particularly for low-resource languages. Prior work has demonstrated the effectiveness of cross-lingual transfer learning and data augmentation in improving performance on tasks with limited labeled data. To develop an efficient and scalable cross-lingual transfer learning approach, we leverage nearest-neighbor retrieval to augment minimal labeled data in the target language, thereby enhancing detection performance. Specifically, we assume access to a small set of labeled training instances in the target language and use these to retrieve the most relevant labeled examples from a large multilingual hate speech detection pool. We evaluate our approach on eight languages and demonstrate that it consistently outperforms models trained solely on the target language data. Furthermore, in most cases, our method surpasses the current state-of-the-art. Notably, our approach is highly data-efficient, retrieving as small as 200 instances in some cases while maintaining superior performance. Moreover, it is scalable, as the retrieval pool can be easily expanded, and the method can be readily adapted to new languages and tasks. We also apply maximum marginal relevance to mitigate redundancy and filter out highly similar retrieved instances, resulting in improvements in some languages.

MCML Authors

Faeze Ghorbanpour

Data Analytics & Statistics

Daryna Dementieva

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[1809]

X. Guo, A. Li, Y. Wang, S. Jegelka and Y. Wang.
G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning.
Preprint (May. 2025). arXiv GitHub

Abstract

Although Large Language Models (LLMs) have demonstrated remarkable progress, their proficiency in graph-related tasks remains notably limited, hindering the development of truly general-purpose models. Previous attempts, including pretraining graph foundation models or employing supervised fine-tuning, often face challenges such as the scarcity of large-scale, universally represented graph data. We introduce G1, a simple yet effective approach demonstrating that Reinforcement Learning (RL) on synthetic graph-theoretic tasks can significantly scale LLMs’ graph reasoning abilities. To enable RL training, we curate Erdõs, the largest graph reasoning dataset to date comprising 50 diverse graph-theoretic tasks of varying difficulty levels, 100k training data and 5k test data, all drived from real-world graphs. With RL on Erdõs, G1 obtains substantial improvements in graph reasoning, where our finetuned 3B model even outperforms Qwen2.5-72B-Instruct (24x size). RL-trained models also show strong zero-shot generalization to unseen tasks, domains, and graph encoding schemes, including other graph-theoretic benchmarks as well as real-world node classification and link prediction tasks, without compromising general reasoning abilities. Our findings offer an efficient, scalable path for building strong graph reasoners by finetuning LLMs with RL on graph-theoretic tasks, which combines the strengths of pretrained LLM capabilities with abundant, automatically generated synthetic data, suggesting that LLMs possess graph understanding abilities that RL can elicit successfully.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1808]

P. Henkel, J. Li and P. Rinke.
Design Rules for Optimizing Quaternary Mixed-Metal Chalcohalides.
Preprint (May. 2025). arXiv

Abstract

Quaternary mixed-metal M(II)2M(III)Ch2X3 chalcohalides are an emerging material class for photovoltaic absorbers that combines the beneficial optoelectronic properties of lead-based halide perovskites with the stability of metal chalcogenides. Inspired by the recent discovery of lead-free mixed-metal chalcohalides materials, we utilized a combination of density functional theory and machine learning to determine compositional trends and chemical design rules in the lead-free and lead-based materials spaces. We explored a total of 54 M(II)2M(III)Ch2X3 materials with M(II) = Sn, Pb, M(III) = In, Sb, Bi, Ch = S, Se, Te, and X = Cl, Br, I per phase (Cmcm, Cmc21 , and P21/c). The P21/c phase is the equilibrium phase at low temperatures, followed by Cmc21 and Cmcm. The fundamental band gaps in Cmcm and Cmc21 are smaller than those in P21/c, but direct band gaps are more common in Cmcm and Cmc21. The effective electron masses in P21/c are significantly larger compared to Cmcm and Cmc21, while the effective hole masses are nearly the same across all three phases. Using random forest regression, we found that the two electron acceptor sites (Ch and X) are crucial in shaping the properties of mixed-metal chalcohalide compounds. Furthermore, the electron donor sites (M(II) and M(III)) can be used to finetune the material properties to desired applications. These design rules enable precise tailoring of mixed-metal chalcohalide compounds for a variety of applications.

MCML Authors

Patrick Rinke

Prof. Dr.

AI-based Material Science

[1807]

P. Hofman, Y. Sale and E. Hüllermeier.
Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks.
Preprint (May. 2025). arXiv

Abstract

We address the problem of uncertainty quantification and propose measures of total, aleatoric, and epistemic uncertainty based on a known decomposition of (strictly) proper scoring rules, a specific type of loss function, into a divergence and an entropy component. This leads to a flexible framework for uncertainty quantification that can be instantiated with different losses (scoring rules), which makes it possible to tailor uncertainty quantification to the use case at hand. We show that this flexibility is indeed advantageous. In particular, we analyze the task of selective prediction and show that the scoring rule should ideally match the task loss. In addition, we perform experiments on two other common tasks. For out-of-distribution detection, our results confirm that a widely used measure of epistemic uncertainty, mutual information, performs best. Moreover, in the setting of active learning, our measure of epistemic uncertainty based on the zero-one-loss consistently outperforms other uncertainty measures.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence and Machine Learning

[1806]

N. Holzner, S. Maier and S. Feuerriegel.
Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis.
Preprint (May. 2025). arXiv

Abstract

Generative artificial intelligence (GenAI) is increasingly used to support a wide range of human tasks, yet empirical evidence on its effect on creativity remains scattered. Can GenAI generate ideas that are creative? To what extent can it support humans in generating ideas that are both creative and diverse? In this study, we conduct a meta-analysis to evaluate the effect of GenAI on the performance in creative tasks. For this, we first perform a systematic literature search, based on which we identify n = 28 relevant studies (m = 8214 participants) for inclusion in our meta-analysis. We then compute standardized effect sizes based on Hedges’ g. We compare different outcomes: (i) how creative GenAI is; (ii) how creative humans augmented by GenAI are; and (iii) the diversity of ideas by humans augmented by GenAI. Our results show no significant difference in creative performance between GenAI and humans (g = -0.05), while humans collaborating with GenAI significantly outperform those working without assistance (g = 0.27). However, GenAI has a significant negative effect on the diversity of ideas for such collaborations between humans and GenAI (g = -0.86). We further analyze heterogeneity across different GenAI models (e.g., GPT-3.5, GPT-4), different tasks (e.g., creative writing, ideation, divergent thinking), and different participant populations (e.g., laypeople, business, academia). Overall, our results position GenAI as an augmentative tool that can support, rather than replace, human creativity-particularly in tasks benefiting from ideation support.

MCML Authors

Sebastian Maier

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Artificial Intelligence in Management

[1805]

P. Hong, B. Chen, S. Peng, M.-C. de Marneffe and B. Plank.
LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference.
Preprint (May. 2025). arXiv

Abstract

There is increasing evidence of Human Label Variation (HLV) in Natural Language Inference (NLI), where annotators assign different labels to the same premise-hypothesis pair. However, within-label variation–cases where annotators agree on the same label but provide divergent reasoning–poses an additional and mostly overlooked challenge. Several NLI datasets contain highlighted words in the NLI item as explanations, but the same spans on the NLI item can be highlighted for different reasons, as evidenced by free-text explanations, which offer a window into annotators’ reasoning. To systematically understand this problem and gain insight into the rationales behind NLI labels, we introduce LITEX, a linguistically-informed taxonomy for categorizing free-text explanations. Using this taxonomy, we annotate a subset of the e-SNLI dataset, validate the taxonomy’s reliability, and analyze how it aligns with NLI labels, highlights, and explanations. We further assess the taxonomy’s usefulness in explanation generation, demonstrating that conditioning generation on LITEX yields explanations that are linguistically closer to human explanations than those generated using only labels or highlights. Our approach thus not only captures within-label variation but also shows how taxonomy-guided generation for reasoning can bridge the gap between human and model explanations more effectively than existing strategies.

MCML Authors

Beiduo Chen

AI and Computational Linguistics

Siyao Peng

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1804]

A. Javanmardi, S. H. Zargarbashi, S. M. A. R. Thies, W. Waegeman, A. Bojchevski and E. Hüllermeier.
Optimal Conformal Prediction under Epistemic Uncertainty.
Preprint (May. 2025). arXiv

Abstract

Conformal prediction (CP) is a popular frequentist framework for representing uncertainty by providing prediction sets that guarantee coverage of the true label with a user-adjustable probability. In most applications, CP operates on confidence scores coming from a standard (first-order) probabilistic predictor (e.g., softmax outputs). Second-order predictors, such as credal set predictors or Bayesian models, are also widely used for uncertainty quantification and are known for their ability to represent both aleatoric and epistemic uncertainty. Despite their popularity, there is still an open question on ``how they can be incorporated into CP’’. In this paper, we discuss the desiderata for CP when valid second-order predictions are available. We then introduce Bernoulli prediction sets (BPS), which produce the smallest prediction sets that ensure conditional coverage in this setting. When given first-order predictions, BPS reduces to the well-known adaptive prediction sets (APS). Furthermore, when the validity assumption on the second-order predictions is compromised, we apply conformal risk control to obtain a marginal coverage guarantee while still accounting for epistemic uncertainty.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1803]

X. Jing, J. Wang, I. Tsangko, A. Triantafyllopoulos and B. W. Schuller.
MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge.
Preprint (May. 2025). arXiv

Abstract

Although speech emotion recognition (SER) has advanced significantly with deep learning, annotation remains a major hurdle. Human annotation is not only costly but also subject to inconsistencies annotators often have different preferences and may lack the necessary contextual knowledge, which can lead to varied and inaccurate labels. Meanwhile, Large Language Models (LLMs) have emerged as a scalable alternative for annotating text data. However, the potential of LLMs to perform emotional speech data annotation without human supervision has yet to be thoroughly investigated. To address these problems, we apply GPT-4o to annotate a multimodal dataset collected from the sitcom Friends, using only textual cues as inputs. By crafting structured text prompts, our methodology capitalizes on the knowledge GPT-4o has accumulated during its training, showcasing that it can generate accurate and contextually relevant annotations without direct access to multimodal inputs. Therefore, we propose MELT, a multimodal emotion dataset fully annotated by GPT-4o. We demonstrate the effectiveness of MELT by fine-tuning four self-supervised learning (SSL) backbones and assessing speech emotion recognition performance across emotion datasets. Additionally, our subjective experiments’ results demonstrate a consistence performance improvement on SER.

MCML Authors

Xin Jing

Health Informatics

Iosif Tsangko

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Health Informatics

[1802]

A. Karamolegkou, A. Borah, E. Cho, S. R. Choudhury, M. Galletti, R. Ghosh, P. Gupta, O. Ignat, P. Kargupta, N. Kotonya, H. Lamba, S.-J. Lee, A. Mangla, I. Mondal, D. Nazarova, P. Nemkova, D. Pisarevskaya, N. Rizwan, N. Sabri, D. Stammbach, A. Steinberg, D. Tomás, S. R. Wilson, B. Yi, J. H. Zhu, A. Zubiaga, A. Søgaard, A. Fraser, Z. Jin, R. Mihalcea, J. R. Tetreault and D. Dementieva.
NLP for Social Good: A Survey of Challenges, Opportunities, and Responsible Deployment.
Preprint (May. 2025). arXiv

Abstract

Recent advancements in large language models (LLMs) have unlocked unprecedented possibilities across a range of applications. However, as a community, we believe that the field of Natural Language Processing (NLP) has a growing need to approach deployment with greater intentionality and responsibility. In alignment with the broader vision of AI for Social Good (Tomašev et al., 2020), this paper examines the role of NLP in addressing pressing societal challenges. Through a cross-disciplinary analysis of social goals and emerging risks, we highlight promising research directions and outline challenges that must be addressed to ensure responsible and equitable progress in NLP4SG research.

MCML Authors

Anna Steinberg

Social Data Science and AI

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

Daryna Dementieva

Dr.

Data Analytics & Statistics

[1801]

T. Karvonen, G. Santin and T. Wenzel.
General superconvergence for kernel-based approximation.
Preprint (May. 2025). arXiv

Abstract

Kernel interpolation is a fundamental technique for approximating functions from scattered data, with a well-understood convergence theory when interpolating elements of a reproducing kernel Hilbert space. Beyond this classical setting, research has focused on two regimes: misspecified interpolation, where the kernel smoothness exceeds that of the target function, and superconvergence, where the target is smoother than the Hilbert space. This work addresses the latter, where smoother target functions yield improved convergence rates, and extends existing results by characterizing superconvergence for projections in general Hilbert spaces. We show that functions lying in ranges of certain operators, including adjoint of embeddings, exhibit accelerated convergence, which we extend across interpolation scales between these ranges and the full Hilbert space. In particular, we analyze Mercer operators and embeddings into Lp spaces, linking the images of adjoint operators to Mercer power spaces. Applications to Sobolev spaces are discussed in detail, highlighting how superconvergence depends critically on boundary conditions. Our findings generalize and refine previous results, offering a broader framework for understanding and exploiting superconvergence. The results are supported by numerical experiments.

MCML Authors

Tizian Wenzel

Dr.

A2 | Mathematical Foundations
→ Group Holger Rauhut

Mathematical Data Science and Artificial Intelligence

[1800]

J. Kim, S. Alaniz, C. Schmid and Z. Akata.
LoFT: LoRA-fused Training Dataset Generation with Few-shot Guidance.
Preprint (May. 2025). arXiv GitHub

Abstract

Despite recent advances in text-to-image generation, using synthetically generated data seldom brings a significant boost in performance for supervised learning. Oftentimes, synthetic datasets do not faithfully recreate the data distribution of real data, i.e., they lack the fidelity or diversity needed for effective downstream model training. While previous work has employed few-shot guidance to address this issue, existing methods still fail to capture and generate features unique to specific real images. In this paper, we introduce a novel dataset generation framework named LoFT, LoRA-Fused Training-data Generation with Few-shot Guidance. Our method fine-tunes LoRA weights on individual real images and fuses them at inference time, producing synthetic images that combine the features of real images for improved diversity and fidelity of generated data. We evaluate the synthetic data produced by LoFT on 10 datasets, using 8 to 64 real images per class as guidance and scaling up to 1000 images per class. Our experiments show that training on LoFT-generated data consistently outperforms other synthetic dataset methods, significantly increasing accuracy as the dataset size increases. Additionally, our analysis demonstrates that LoFT generates datasets with high fidelity and sufficient diversity, which contribute to the performance improvement.

MCML Authors

Stephan Alaniz

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1799]

C. Kühn and S.-V. Kuntz.
The Influence of the Memory Capacity of Neural DDEs on the Universal Approximation Property.
Preprint (May. 2025). arXiv

Abstract

Neural Ordinary Differential Equations (Neural ODEs), which are the continuous-time analog of Residual Neural Networks (ResNets), have gained significant attention in recent years. Similarly, Neural Delay Differential Equations (Neural DDEs) can be interpreted as an infinite depth limit of Densely Connected Residual Neural Networks (DenseResNets). In contrast to traditional ResNet architectures, DenseResNets are feed-forward networks that allow for shortcut connections across all layers. These additional connections introduce memory in the network architecture, as typical in many modern architectures. In this work, we explore how the memory capacity in neural DDEs influences the universal approximation property. The key parameter for studying the memory capacity is the product Kτ of the Lipschitz constant and the delay of the DDE. In the case of non-augmented architectures, where the network width is not larger than the input and output dimensions, neural ODEs and classical feed-forward neural networks cannot have the universal approximation property. We show that if the memory capacity Kτ is sufficiently small, the dynamics of the neural DDE can be approximated by a neural ODE. Consequently, non-augmented neural DDEs with a small memory capacity also lack the universal approximation property. In contrast, if the memory capacity Kτ is sufficiently large, we can establish the universal approximation property of neural DDEs for continuous functions. If the neural DDE architecture is augmented, we can expand the parameter regions in which universal approximation is possible. Overall, our results show that by increasing the memory capacity Kτ, the infinite-dimensional phase space of DDEs with positive delay τ>0 is not sufficient to guarantee a direct jump transition to universal approximation, but only after a certain memory threshold, universal approximation holds.

MCML Authors

Christian Kühn

Prof. Dr.

A2 | Mathematical Foundations
→ Group Christian Kühn

Multiscale and Stochastic Dynamics

Sara-Viola Kuntz

Multiscale and Stochastic Dynamics

[1798]

J. Lan, Y. Fu, U. Schlegel, G. Zhang, T. Hannan, H. Chen and T. Seidl.
My Answer Is NOT 'Fair': Mitigating Social Bias in Vision-Language Models via Fair and Biased Residuals.
Preprint (May. 2025). arXiv

Abstract

Social bias is a critical issue in large vision-language models (VLMs), where fairness- and ethics-related problems harm certain groups of people in society. It is unknown to what extent VLMs yield social bias in generative responses. In this study, we focus on evaluating and mitigating social bias on both the model’s response and probability distribution. To do so, we first evaluate four state-of-the-art VLMs on PAIRS and SocialCounterfactuals datasets with the multiple-choice selection task. Surprisingly, we find that models suffer from generating gender-biased or race-biased responses. We also observe that models are prone to stating their responses are fair, but indeed having mis-calibrated confidence levels towards particular social groups. While investigating why VLMs are unfair in this study, we observe that VLMs’ hidden layers exhibit substantial fluctuations in fairness levels. Meanwhile, residuals in each layer show mixed effects on fairness, with some contributing positively while some lead to increased bias. Based on these findings, we propose a post-hoc method for the inference stage to mitigate social bias, which is training-free and model-agnostic. We achieve this by ablating bias-associated residuals while amplifying fairness-associated residuals on model hidden layers during inference. We demonstrate that our post-hoc method outperforms the competing training strategies, helping VLMs have fairer responses and more reliable confidence levels.

MCML Authors

Udo Schlegel

Database Systems and Data Mining

Gengyuan Zhang

Database Systems and Data Mining

Tanveer Hannan

Database Systems and Data Mining

Haokun Chen

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[1797]

Y. Li, S. Shao, M. Milling and B. W. Schuller.
Large Language Models for Depression Recognition in Spoken Language Integrating Psychological Knowledge.
Preprint (May. 2025). arXiv GitHub

Abstract

Depression is a growing concern gaining attention in both public discourse and AI research. While deep neural networks (DNNs) have been used for recognition, they still lack real-world effectiveness. Large language models (LLMs) show strong potential but require domain-specific fine-tuning and struggle with non-textual cues. Since depression is often expressed through vocal tone and behaviour rather than explicit text, relying on language alone is insufficient. Diagnostic accuracy also suffers without incorporating psychological expertise. To address these limitations, we present, to the best of our knowledge, the first application of LLMs to multimodal depression detection using the DAIC-WOZ dataset. We extract the audio features using the pre-trained model Wav2Vec, and mapped it to text-based LLMs for further processing. We also propose a novel strategy for incorporating psychological knowledge into LLMs to enhance diagnostic performance, specifically using a question and answer set to grant authorised knowledge to LLMs. Our approach yields a notable improvement in both Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) compared to a base score proposed by the related original paper.

MCML Authors

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Health Informatics

[1796]

Y. Liu, M. Wang, A. H. Kargaran, F. Körner, E. Nie, B. Plank, F. Yvon and H. Schütze.
Tracing Multilingual Factual Knowledge Acquisition in Pretraining.
Preprint (May. 2025). arXiv GitHub

Abstract

Large Language Models (LLMs) are capable of recalling multilingual factual knowledge present in their pretraining data. However, most studies evaluate only the final model, leaving the development of factual recall and crosslingual consistency throughout pretraining largely unexplored. In this work, we trace how factual recall and crosslingual consistency evolve during pretraining, focusing on OLMo-7B as a case study. We find that both accuracy and consistency improve over time for most languages. We show that this improvement is primarily driven by the fact frequency in the pretraining corpus: more frequent facts are more likely to be recalled correctly, regardless of language. Yet, some low-frequency facts in non-English languages can still be correctly recalled. Our analysis reveals that these instances largely benefit from crosslingual transfer of their English counterparts – an effect that emerges predominantly in the early stages of pretraining. We pinpoint two distinct pathways through which multilingual factual knowledge acquisition occurs: (1) frequency-driven learning, which is dominant and language-agnostic, and (2) crosslingual transfer, which is limited in scale and typically constrained to relation types involving named entities.

MCML Authors

Yihong Liu

Computational Linguistics

Mingyang Wang

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Amir Hossein Kargaran

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Felicia Körner

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1795]

Y. Liu, X. Xu, E. Nie, Z. Wang, S. Feng, D. Wang, Q. Li and H. Schütze.
Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning.
Preprint (May. 2025). arXiv GitHub

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods achieve performance comparable to Full Fine-Tuning (FFT) while requiring significantly fewer computing resources, making it the go-to choice for researchers. We find that although PEFT can achieve competitive results on some benchmarks, its performance falls short of FFT in complex tasks, such as reasoning and instruction-based fine-tuning. In this paper, we compare the characteristics of PEFT and FFT in terms of representational capacity and robustness based on optimization theory. We theoretically demonstrate that PEFT is a strict subset of FFT. By providing theoretical upper bounds for PEFT, we show that the limited parameter space constrains the model’s representational ability, making it more susceptible to perturbations. Experiments on 15 datasets encompassing classification, generation, reasoning, instruction fine-tuning tasks and 11 adversarial test sets validate our theories. We hope that these results spark further research beyond the realms of well established PEFT.

MCML Authors

Yongkang Liu

* Former Member

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1794]

T. Löhr, P. Hofman, F. Mohr and E. Hüllermeier.
Credal Prediction based on Relative Likelihood.
Preprint (May. 2025). arXiv

Abstract

Predictions in the form of sets of probability distributions, so-called credal sets, provide a suitable means to represent a learner’s epistemic uncertainty. In this paper, we propose a theoretically grounded approach to credal prediction based on the statistical notion of relative likelihood: The target of prediction is the set of all (conditional) probability distributions produced by the collection of plausible models, namely those models whose relative likelihood exceeds a specified threshold. This threshold has an intuitive interpretation and allows for controlling the trade-off between correctness and precision of credal predictions. We tackle the problem of approximating credal sets defined in this way by means of suitably modified ensemble learning techniques. To validate our approach, we illustrate its effectiveness by experiments on benchmark datasets demonstrating superior uncertainty representation without compromising predictive performance. We also compare our method against several state-of-the-art baselines in credal prediction.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Artificial Intelligence and Machine Learning

[1793]

S. Maskey, R. Paolino, F. Jogl, G. Kutyniok and J. Lutzeyer.
Graph Representational Learning: When Does More Expressivity Hurt Generalization?
Preprint (May. 2025). arXiv

Abstract

Graph Neural Networks (GNNs) are powerful tools for learning on structured data, yet the relationship between their expressivity and predictive performance remains unclear. We introduce a family of premetrics that capture different degrees of structural similarity between graphs and relate these similarities to generalization, and consequently, the performance of expressive GNNs. By considering a setting where graph labels are correlated with structural features, we derive generalization bounds that depend on the distance between training and test graphs, model complexity, and training set size. These bounds reveal that more expressive GNNs may generalize worse unless their increased complexity is balanced by a sufficiently large training set or reduced distance between training and test graphs. Our findings relate expressivity and generalization, offering theoretical insights supported by empirical results.

MCML Authors

Sohir Maskey

Mathematical Foundations of Artificial Intelligence

Raffaele Paolino

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Mathematical Foundations of Artificial Intelligence

[1792]

E. Nie, H. Schmid and H. Schütze.
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models.
Preprint (May. 2025). arXiv

Abstract

Language confusion – where large language models (LLMs) generate unintended languages against the user’s need – remains a critical challenge, especially for English-centric models. We present the first mechanistic interpretability (MI) study of language confusion, combining behavioral benchmarking with neuron-level analysis. Using the Language Confusion Benchmark (LCB), we show that confusion points (CPs) – specific positions where language switches occur – are central to this phenomenon. Through layer-wise analysis with TunedLens and targeted neuron attribution, we reveal that transition failures in the final layers drive confusion. We further demonstrate that editing a small set of critical neurons, identified via comparative analysis with multilingual-tuned models, substantially mitigates confusion without harming general competence or fluency. Our approach matches multilingual alignment in confusion reduction for most languages and yields cleaner, higher-quality outputs. These findings provide new insights into the internal dynamics of LLMs and highlight neuron-level interventions as a promising direction for robust, interpretable multilingual language modeling.

MCML Authors

Ercong Nie

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1791]

E. Özsoy, A. Mamur, F. Tristram, C. Pellegrini, M. Wysocki, B. Busam and N. Navab.
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding.
Preprint (May. 2025). arXiv

Abstract

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery, EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR’s multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

MCML Authors

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Felix Tristram

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1790]

E. Özsoy, C. Pellegrini, D. Bani-Harouni, K. Yuan, M. Keicher and N. Navab.
ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling.
Preprint (May. 2025). arXiv

Abstract

The real-world complexity of surgeries necessitates surgeons to have deep and holistic comprehension to ensure precision, safety, and effective interventions. Computational systems are required to have a similar level of comprehension within the operating room. Prior works, limited to single-task efforts like phase recognition or scene graph generation, lack scope and generalizability. In this work, we introduce ORQA, a novel OR question answering benchmark and foundational multimodal model to advance OR intelligence. By unifying all four public OR datasets into a comprehensive benchmark, we enable our approach to concurrently address a diverse range of OR challenges. The proposed multimodal large language model fuses diverse OR signals such as visual, auditory, and structured data, for a holistic modeling of the OR. Finally, we propose a novel, progressive knowledge distillation paradigm, to generate a family of models optimized for different speed and memory requirements. We show the strong performance of ORQA on our proposed benchmark, and its zero-shot generalization, paving the way for scalable, unified OR modeling and significantly advancing multimodal surgical intelligence. We will release our code and data upon acceptance.

MCML Authors

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Kun Yuan

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Computer Aided Medical Procedures & Augmented Reality

[1789]

P. Scholl, A. Dietrich, S. Wolf, J. Lee, A.-A. Schäffer, G. Kutyniok and M. Iskandar.
Interpretable Robotic Friction Learning via Symbolic Regression.
Preprint (May. 2025). arXiv

Abstract

Accurately modeling the friction torque in robotic joints has long been challenging due to the request for a robust mathematical description. Traditional model-based approaches are often labor-intensive, requiring extensive experiments and expert knowledge, and they are difficult to adapt to new scenarios and dependencies. On the other hand, data-driven methods based on neural networks are easier to implement but often lack robustness, interpretability, and trustworthiness–key considerations for robotic hardware and safety-critical applications such as human-robot interaction. To address the limitations of both approaches, we propose the use of symbolic regression (SR) to estimate the friction torque. SR generates interpretable symbolic formulas similar to those produced by model-based methods while being flexible to accommodate various dynamic effects and dependencies. In this work, we apply SR algorithms to approximate the friction torque using collected data from a KUKA LWR-IV+ robot. Our results show that SR not only yields formulas with comparable complexity to model-based approaches but also achieves higher accuracy. Moreover, SR-derived formulas can be seamlessly extended to include load dependencies and other dynamic factors.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Mathematical Foundations of Artificial Intelligence

[1788]

M. Schröder, J. Hartenstein and S. Feuerriegel.
PrivATE: Differentially Private Confidence Intervals for Average Treatment Effects.
Preprint (May. 2025). arXiv

Abstract

The average treatment effect (ATE) is widely used to evaluate the effectiveness of drugs and other medical interventions. In safety-critical applications like medicine, reliable inferences about the ATE typically require valid uncertainty quantification, such as through confidence intervals (CIs). However, estimating treatment effects in these settings often involves sensitive data that must be kept private. In this work, we present PrivATE, a novel machine learning framework for computing CIs for the ATE under differential privacy. Specifically, we focus on deriving valid privacy-preserving CIs for the ATE from observational data. Our PrivATE framework consists of three steps: (i) estimating a differentially private ATE through output perturbation; (ii) estimating the differentially private variance through a truncated output perturbation mechanism; and (iii) constructing the CIs while accounting for the uncertainty from both the estimation and privatization steps. Our PrivATE framework is model agnostic, doubly robust, and ensures valid CIs. We demonstrate the effectiveness of our framework using synthetic and real-world medical datasets. To the best of our knowledge, we are the first to derive a general, doubly robust framework for valid CIs of the ATE under (ε, δ)-differential privacy.

MCML Authors

Maresa Schröder

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Andreas Döpp

Artificial Intelligence in Management

[1787]

J. Schroeder, S. Howard, C. Eberle, J. Esslinger, N. Leopold-Kerschbaumer, K. V. Kepesidis and A. Döpp.
Information-optimal measurement: From fixed sampling protocols to adaptive spectroscopy.
Preprint (May. 2025). arXiv

Abstract

All measurements of continuous signals rely on taking discrete snapshots, with the Nyquist-Shannon theorem dictating sampling paradigms. We present a broader framework of information-optimal measurement, showing that traditional sampling is optimal only when we are entirely ignorant about the system under investigation. This insight unlocks methods that efficiently leverage prior information to overcome long-held fundamental sampling limitations. We demonstrate this for optical spectroscopy - vital to research and medicine - and show how adaptively selected measurements yield higher information in medical blood analysis, optical metrology, and hyperspectral imaging. Through our rigorous statistical framework, performance never falls below conventional sampling while providing complete uncertainty quantification in real time. This establishes a new paradigm where measurement devices operate as information-optimal agents, fundamentally changing how scientific instruments collect and process data.

MCML Authors

Sunny Howard

Data-driven methods in Physics and Optics

Christoph Eberle

A1 | Statistical Foundations & Explainability
→ Group Andreas Döpp

Data-driven methods in Physics and Optics

Andreas Döpp

Dr. habil

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Data-driven methods in Physics and Optics

[1786]

I. Sen, B. Ma, G. Ahnert, A.-C. Haensch, T. Holtdirk, F. Kreuter and M. Strohmaier.
Connecting Natural Language Processing and Survey Methodology: Potentials, Challenges, and Open Questions.
Preprint (May. 2025). DOI

Abstract

Recent generative AI technologies, particularly Large Language Models (LLMs), have increased interest in Natural Language Processing (NLP) methods for scientists and practitioners across disciplines. In this position paper, we highlight one such discipline — survey methodology, which not only uses more and more NLP techniques, e.g., using LLMs to simulate survey respondents, but also stands to benefit NLP, e.g., informing the design of NLP annotation and evaluation tasks. We argue for increasing synergies between NLP and Survey Methodology to realize the potential at their intersection. We also outline challenges that impede progress on these potential synergies and present 10 open questions to encourage further reflection.

MCML Authors

Bolei Ma

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Tobias Holtdirk

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Social Data Science and AI

[1785]

Y. Shen, W. Lai, S. Wang, K. Luo, A. Fraser and M. Sun.
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora.
Preprint (May. 2025). arXiv

Abstract

Continued pretraining and instruction tuning on large-scale multilingual data have proven to be effective in scaling large language models (LLMs) to low-resource languages. However, the unaligned nature of such data limits its ability to effectively capture cross-lingual semantics. In contrast, multi-way parallel data, where identical content is aligned across multiple languages, provides stronger cross-lingual consistency and offers greater potential for improving multilingual performance. In this paper, we introduce a large-scale, high-quality multi-way parallel corpus, TED2025, based on TED Talks. The corpus spans 113 languages, with up to 50 languages aligned in parallel, ensuring extensive multilingual coverage. Using this dataset, we investigate best practices for leveraging multi-way parallel data to enhance LLMs, including strategies for continued pretraining, instruction tuning, and the analysis of key influencing factors. Experiments on six multilingual benchmarks show that models trained on multiway parallel data consistently outperform those trained on unaligned multilingual data.

MCML Authors

Wen Lai

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[1784]

R. S.-E. Shim, D. De Cristofaro, C. M. Hu, A. Vietti and B. Plank.
Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically.
Preprint (May. 2025). arXiv

Abstract

Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs. Such an alignment has also been observed in speech foundation models. However, it remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech. Building on prior work on spoken translation retrieval, we perform pronunciation-controlled experiments to observe if cross-lingual alignment can indeed occur in such models on a semantic basis, instead of relying on phonetic similarities. Our findings indicate that even in the absence of phonetic cues, spoken translation retrieval accuracy remains relatively stable. We follow up with a controlled experiment on a word-level dataset of cross-lingual synonyms and near-homophones, confirming the existence of both phonetic and semantic knowledge in the encoder. Finally, we qualitatively examine the transcriptions produced by early exiting the encoder, where we observe that speech translation produces semantic errors that are characterized by phonetic similarities to corresponding words in the source language. We apply this insight from early exiting to speech recognition in seven low-resource languages unsupported by the Whisper model, and achieve improved accuracy in all languages examined, particularly for languages with transparent orthographies.

MCML Authors

Barbara Plank

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

AI and Computational Linguistics

[1783]

R. Sonabend, J. Zobolas, R. Bin, P. Kopper, L. Burk and A. Bender.
Examining marginal properness in the external validation of survival models with squared and logarithmic losses.
Preprint (May. 2025). arXiv

Abstract

Scoring rules promote rational and honest decision-making, which is important for model evaluation and becoming increasingly important for automated procedures such as ‘AutoML’. In this paper we survey common squared and logarithmic scoring rules for survival analysis, with a focus on their theoretical and empirical properness. We introduce a marginal definition of properness and show that both the Integrated Survival Brier Score (ISBS) and the Right-Censored Log-Likelihood (RCLL) are theoretically improper under this definition. We also investigate a new class of losses that may inform future survival scoring rules. Simulation experiments reveal that both the ISBS and RCLL behave as proper scoring rules in practice. The RCLL showed no violations across all settings, while ISBS exhibited only minor, negligible violations at extremely small sample sizes, suggesting one can trust results from historical experiments. As such we advocate for both the RCLL and ISBS in external validation of models, including in automated procedures. However, we note practical challenges in estimating these losses including estimation of censoring distributions and densities; as such further research is required to advance development of robust and honest evaluation in survival analysis.

MCML Authors

Lukas Burk

Statistical Learning and Data Science

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[1782]

J. Wang, P. Gupta, I. Habernal and E. Hüllermeier.
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs.
Preprint (May. 2025). arXiv

Abstract

Recent studies demonstrate that Large Language Models (LLMs) are vulnerable to different prompt-based attacks, generating harmful content or sensitive information. Both closed-source and open-source LLMs are underinvestigated for these attacks. This paper studies effective prompt injection attacks against the 14 most popular open-source LLMs on five attack benchmarks. Current metrics only consider successful attacks, whereas our proposed Attack Success Probability (ASP) also captures uncertainty in the model’s response, reflecting ambiguity in attack feasibility. By comprehensively analyzing the effectiveness of prompt injection attacks, we propose a simple and effective hypnotism attack; results show that this attack causes aligned language models, including Stablelm2, Mistral, Openchat, and Vicuna, to generate objectionable behaviors, achieving around 90% ASP. They also indicate that our ignore prefix attacks can break all 14 open-source LLMs, achieving over 60% ASP on a multi-categorical dataset. We find that moderately well-known LLMs exhibit higher vulnerability to prompt injection attacks, highlighting the need to raise public awareness and prioritize efficient mitigation strategies.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence and Machine Learning

[1781]

M. Wang, L. Lange, H. Adel, Y. Ma, J. Strötgen and H. Schütze.
Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes.
Preprint (May. 2025). arXiv

Abstract

Reasoning language models (RLMs) excel at complex tasks by leveraging a chain-of-thought process to generate structured intermediate steps. However, language mixing, i.e., reasoning steps containing tokens from languages other than the prompt, has been observed in their outputs and shown to affect performance, though its impact remains debated. We present the first systematic study of language mixing in RLMs, examining its patterns, impact, and internal causes across 15 languages, 7 task difficulty levels, and 18 subject areas, and show how all three factors influence language mixing. Moreover, we demonstrate that the choice of reasoning language significantly affects performance: forcing models to reason in Latin or Han scripts via constrained decoding notably improves accuracy. Finally, we show that the script composition of reasoning traces closely aligns with that of the model’s internal representations, indicating that language mixing reflects latent processing preferences in RLMs. Our findings provide actionable insights for optimizing multilingual reasoning and open new directions for controlling reasoning languages to build more interpretable and adaptable RLMs.

MCML Authors

Mingyang Wang

Computational Linguistics

Yunpu Ma

Dr.

Database Systems and Data Mining

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1780]

Q. Wang, M. Wang, N. Feldhus, S. Ostermann, Y. Cao, H. Schütze, S. Möller and V. Schmitt.
Through a Compressed Lens: Investigating the Impact of Quantization on LLM Explainability and Interpretability.
Preprint (May. 2025). arXiv

Abstract

Quantization methods are widely used to accelerate inference and streamline the deployment of large language models (LLMs). While prior research has extensively investigated the degradation of various LLM capabilities due to quantization, its effects on model explainability and interpretability, which are crucial for understanding decision-making processes, remain unexplored. To address this gap, we conduct comprehensive experiments using three common quantization techniques at distinct bit widths, in conjunction with two explainability methods, counterfactual examples and natural language explanations, as well as two interpretability approaches, knowledge memorization analysis and latent multi-hop reasoning analysis. We complement our analysis with a thorough user study, evaluating selected explainability methods. Our findings reveal that, depending on the configuration, quantization can significantly impact model explainability and interpretability. Notably, the direction of this effect is not consistent, as it strongly depends on (1) the quantization method, (2) the explainability or interpretability approach, and (3) the evaluation protocol. In some settings, human evaluation shows that quantization degrades explainability, while in others, it even leads to improvements. Our work serves as a cautionary tale, demonstrating that quantization can unpredictably affect model transparency. This insight has important implications for deploying LLMs in applications where transparency is a critical requirement.

MCML Authors

Mingyang Wang

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computational Linguistics

[1779]

X. Wang, M. Wang, Y. Liu, H. Schütze and B. Plank.
Refusal Direction is Universal Across Safety-Aligned Languages.
Preprint (May. 2025). arXiv

Abstract

Refusal mechanisms in large language models (LLMs) are essential for ensuring safety. Recent research has revealed that refusal behavior can be mediated by a single direction in activation space, enabling targeted interventions to bypass refusals. While this is primarily demonstrated in an English-centric context, appropriate refusal behavior is important for any language, but poorly understood. In this paper, we investigate the refusal behavior in LLMs across 14 languages using PolyRefuse, a multilingual safety dataset created by translating malicious and benign English prompts into these languages. We uncover the surprising cross-lingual universality of the refusal direction: a vector extracted from English can bypass refusals in other languages with near-perfect effectiveness, without any additional fine-tuning. Even more remarkably, refusal directions derived from any safety-aligned language transfer seamlessly to others. We attribute this transferability to the parallelism of refusal vectors across languages in the embedding space and identify the underlying mechanism behind cross-lingual jailbreaks. These findings provide actionable insights for building more robust multilingual safety defenses and pave the way for a deeper mechanistic understanding of cross-lingual vulnerabilities in LLMs.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Mingyang Wang

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Yihong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[1778]

Z. Wang, X. Xu, Y. Liu, Y. Zhang, P. Lin, S. Feng, X. Yang, D. Wang and H. Schütze.
Why Do More Experts Fail? A Theoretical Analysis of Model Merging.
Preprint (May. 2025). arXiv GitHub

Abstract

Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model. Although recent model merging methods have shown promising results, they struggle to maintain performance gains as the number of merged models increases. In this paper, we investigate the key obstacles that limit the scalability of model merging when integrating a large number of expert models. First, we prove that there is an upper bound on model merging. Further theoretical analysis reveals that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged. Gaussian Width shows that the marginal benefit of merging additional models diminishes according to a strictly concave function. This implies that the effective parameter space becomes rapidly saturated as the number of merged models increases. Furthermore, using Approximate Kinematics Theory, we prove the existence of a unique optimal threshold beyond which adding more models does not yield significant performance improvements. At the same time, we introduce a straightforward Reparameterized Heavy-Tailed method (RHT) to extend the coverage of the merged model, thereby enhancing its performance. Empirical results on 12 benchmarks, including both knowledge-intensive and general-purpose tasks, validate our theoretical analysis. We believe that these results spark further research beyond the current scope of model merging.

MCML Authors

Yongkang Liu

* Former Member

Peiqin Lin

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Computational Linguistics

[1777]

C. Wu, Y. Cai, Y. Liu, P. Zhu, Y. Xue, Z. Gong, J. Hirschberg and B. Ma.
Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects.
Preprint (May. 2025). arXiv

Abstract

While text-based emotion recognition methods have achieved notable success, real-world dialogue systems often demand a more nuanced emotional understanding than any single modality can offer. Multimodal Emotion Recognition in Conversations (MERC) has thus emerged as a crucial direction for enhancing the naturalness and emotional understanding of human-computer interaction. Its goal is to accurately recognize emotions by integrating information from various modalities such as text, speech, and visual signals.
This survey offers a systematic overview of MERC, including its motivations, core tasks, representative methods, and evaluation strategies. We further examine recent trends, highlight key challenges, and outline future directions. As interest in emotionally intelligent systems grows, this survey provides timely guidance for advancing MERC research.

MCML Authors

Bolei Ma

Social Data Science and AI

[1776]

C. Zhang, S. Wu, Y. Chen, M. Aßenmacher, C. Heumann, Y. Men, G. Fan and J. Gama.
OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery.
Preprint (May. 2025). arXiv GitHub

Abstract

Oracle Bone Inscription (OBI) is the earliest systematic writing system in China, while the identification of Oracle Bone (OB) duplicates is a fundamental issue in OBI research. In this work, we design a progressive OB duplicate discovery framework that combines unsupervised low-level keypoints matching with high-level text-centric content-based matching to refine and rank the candidate OB duplicates with semantic awareness and interpretability. We compare our approach with state-of-the-art content-based image retrieval and image matching methods, showing that our approach yields comparable recall performance and the highest simplified mean reciprocal rank scores for both Top-5 and Top-15 retrieval results, and with significantly accelerated computation efficiency. We have discovered over 60 pairs of new OB duplicates in real-world deployment, which were missed by OBI researchers for decades.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1775]

F. Zhang, Y. Shi and X. Zhu.
Global Collinearity-aware Polygonizer for Polygonal Building Mapping in Remote Sensing.
Preprint (May. 2025). arXiv GitHub

Abstract

This paper addresses the challenge of mapping polygonal buildings from remote sensing images and introduces a novel algorithm, the Global Collinearity-aware Polygonizer (GCP). GCP, built upon an instance segmentation framework, processes binary masks produced by any instance segmentation model. The algorithm begins by collecting polylines sampled along the contours of the binary masks. These polylines undergo a refinement process using a transformer-based regression module to ensure they accurately fit the contours of the targeted building instances. Subsequently, a collinearity-aware polygon simplification module simplifies these refined polylines and generate the final polygon representation. This module employs dynamic programming technique to optimize an objective function that balances the simplicity and fidelity of the polygons, achieving globally optimal solutions. Furthermore, the optimized collinearity-aware objective is seamlessly integrated into network training, enhancing the cohesiveness of the entire pipeline. The effectiveness of GCP has been validated on two public benchmarks for polygonal building mapping. Further experiments reveal that applying the collinearity-aware polygon simplification module to arbitrary polylines, without prior knowledge, enhances accuracy over traditional methods such as the Douglas-Peucker algorithm. This finding underscores the broad applicability of GCP.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

B2 | Natural Language Processing
→ Group Michael Hedderich

Data Science in Earth Observation

[1774]

R. Zhao, B. Chen, B. Plank and M. A. Hedderich.
MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs.
Preprint (May. 2025). arXiv

Abstract

Large language models (LLMs) are used globally across many languages, but their English-centric pretraining raises concerns about cross-lingual disparities for cultural awareness, often resulting in biased outputs. However, comprehensive multilingual evaluation remains challenging due to limited benchmarks and questionable translation quality. To better assess these disparities, we introduce MAKIEval, an automatic multilingual framework for evaluating cultural awareness in LLMs across languages, regions, and topics. MAKIEval evaluates open-ended text generation, capturing how models express culturally grounded knowledge in natural language. Leveraging Wikidata’s multilingual structure as a cross-lingual anchor, it automatically identifies cultural entities in model outputs and links them to structured knowledge, enabling scalable, language-agnostic evaluation without manual annotation or translation. We then introduce four metrics that capture complementary dimensions of cultural awareness: granularity, diversity, cultural specificity, and consensus across languages. We assess 7 LLMs developed from different parts of the world, encompassing both open-source and proprietary systems, across 13 languages, 19 countries and regions, and 6 culturally salient topics (e.g., food, clothing). Notably, we find that models tend to exhibit stronger cultural awareness in English, suggesting that English prompts more effectively activate culturally grounded knowledge. We publicly release our code and data.

MCML Authors

Raoyuan Zhao

AI and Computational Linguistics

Beiduo Chen

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Michael Hedderich

Dr.

B2 | Natural Language Processing
→ Group Michael Hedderich

AI and Computational Linguistics

[1773]

R. Zhao, A. Köksal, A. Modarressi, M. A. H. Michael A. Hedderich and H. Schütze.
Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing.
Preprint (May. 2025). arXiv

Abstract

The reliability of large language models (LLMs) is greatly compromised by their tendency to hallucinate, underscoring the need for precise identification of knowledge gaps within LLMs. Various methods for probing such gaps exist, ranging from calibration-based to prompting-based methods. To evaluate these probing methods, in this paper, we propose a new process based on using input variations and quantitative metrics. Through this, we expose two dimensions of inconsistency in knowledge gap probing. (1) Intra-method inconsistency: Minimal non-semantic perturbations in prompts lead to considerable variance in detected knowledge gaps within the same probing method; e.g., the simple variation of shuffling answer options can decrease agreement to around 40%. (2) Cross-method inconsistency: Probing methods contradict each other on whether a model knows the answer. Methods are highly inconsistent – with decision consistency across methods being as low as 7% – even though the model, dataset, and prompt are all the same. These findings challenge existing probing methods and highlight the urgent need for perturbation-robust probing frameworks.

MCML Authors

Raoyuan Zhao

AI and Computational Linguistics

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Ali Modarressi

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Computational Linguistics

[1772]

S. Zhao, Z. Xiong, J. Zhao and X. Zhu.
ExEBench: Benchmarking Foundation Models on Extreme Earth Events.
Preprint (May. 2025). arXiv GitHub

Abstract

Our planet is facing increasingly frequent extreme events, which pose major risks to human lives and ecosystems. Recent advances in machine learning (ML), especially with foundation models (FMs) trained on extensive datasets, excel in extracting features and show promise in disaster management. Nevertheless, these models often inherit biases from training data, challenging their performance over extreme values. To explore the reliability of FM in the context of extreme events, we introduce textbf{ExE}Bench (textbf{Ex}treme textbf{E}arth Benchmark), a collection of seven extreme event categories across floods, wildfires, storms, tropical cyclones, extreme precipitation, heatwaves, and cold waves. The dataset features global coverage, varying data volumes, and diverse data sources with different spatial, temporal, and spectral characteristics. To broaden the real-world impact of FMs, we include multiple challenging ML tasks that are closely aligned with operational needs in extreme events detection, monitoring, and forecasting. ExEBench aims to (1) assess FM generalizability across diverse, high-impact tasks and domains, (2) promote the development of novel ML methods that benefit disaster management, and (3) offer a platform for analyzing the interactions and cascading effects of extreme events to advance our understanding of Earth system, especially under the climate change expected in the decades to come.

MCML Authors

Jie Zhao

Dr.

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Data Science in Earth Observation

[1771]

S. Zhou, S. Peng, S. Luebke, J. Haßler, M. Haim, S. M. Mohammad and B. Plank.
What Media Frames Reveal About Stance: A Dataset and Study about Memes in Climate Change Discourse.
Preprint (May. 2025). arXiv

Abstract

Media framing refers to the emphasis on specific aspects of perceived reality to shape how an issue is defined and understood. Its primary purpose is to shape public perceptions often in alignment with the authors’ opinions and stances. However, the interaction between stance and media frame remains largely unexplored. In this work, we apply an interdisciplinary approach to conceptualize and computationally explore this interaction with internet memes on climate change. We curate CLIMATEMEMES, the first dataset of climate-change memes annotated with both stance and media frames, inspired by research in communication science. CLIMATEMEMES includes 1,184 memes sourced from 47 subreddits, enabling analysis of frame prominence over time and communities, and sheds light on the framing preferences of different stance holders. We propose two meme understanding tasks: stance detection and media frame detection. We evaluate LLaVA-NeXT and Molmo in various setups, and report the corresponding results on their LLM backbone. Human captions consistently enhance performance. Synthetic captions and human-corrected OCR also help occasionally. Our findings highlight that VLMs perform well on stance, but struggle on frames, where LLMs outperform VLMs. Finally, we analyze VLMs’ limitations in handling nuanced frames and stance expressions on climate change internet memes.

MCML Authors

Shijia Zhou

AI and Computational Linguistics

Siyao Peng

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1770]

D. Zhu, S. Gavranovic, F. Boussuge, B. Busam and S. Ilic.
Generative Data Augmentation for Object Point Cloud Segmentation.
Preprint (May. 2025). arXiv

Abstract

Data augmentation is widely used to train deep learning models to address data scarcity. However, traditional data augmentation (TDA) typically relies on simple geometric transformation, such as random rotation and rescaling, resulting in minimal data diversity enrichment and limited model performance improvement. State-of-the-art generative models for 3D shape generation rely on the denoising diffusion probabilistic models and manage to generate realistic novel point clouds for 3D content creation and manipulation. Nevertheless, the generated 3D shapes lack associated point-wise semantic labels, restricting their usage in enlarging the training data for point cloud segmentation tasks. To bridge the gap between data augmentation techniques and the advanced diffusion models, we extend the state-of-the-art 3D diffusion model, Lion, to a part-aware generative model that can generate high-quality point clouds conditioned on given segmentation masks. Leveraging the novel generative model, we introduce a 3-step generative data augmentation (GDA) pipeline for point cloud segmentation training. Our GDA approach requires only a small amount of labeled samples but enriches the training data with generated variants and pseudo-labeled samples, which are validated by a novel diffusion-based pseudo-label filtering method. Extensive experiments on two large-scale synthetic datasets and a real-world medical dataset demonstrate that our GDA method outperforms TDA approach and related semi-supervised and self-supervised methods.

MCML Authors

Dekai Zhu

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Computer Aided Medical Procedures & Augmented Reality

[1769]

J. Kobialka, E. Sommer, J. Kwon, D. Dold and D. Rügamer.
Approximate Posteriors in Neural Networks: A Sampling Perspective.
AABI 2025 - 7th Symposium on Advances in Approximate Bayesian Inference collocated with the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 29, 2025. To be published. Preprint available. URL

Abstract

The landscape of neural network loss functions is known to be highly complex, and the ability of gradient-based approaches to find well-generalizing solutions to such high-dimensional problems is often considered a miracle. Similarly, Bayesian neural networks (BNNs) inherit this complexity through the model’s likelihood. In applications where BNNs are used to account for weight uncertainty, recent advantages in sampling-based inference (SAI) have shown promising results outperforming other approximate Bayesian inference (ABI) methods. In this work, we analyze the approximate posterior implicitly defined by SAI and uncover key insights into its success. Among other things, we demonstrate how SAI handles symmetries differently than ABI, and examine the role of overparameterization. Further, we investigate the characteristics of approximate posteriors with sampling budgets scaled far beyond previously studied limits and explain why the localized behavior of samplers does not inherently constitute a disadvantage.

MCML Authors

Julius Kobialka

Statistics, Data Science and Machine Learning

Emanuel Sommer

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1768]

T. Nagler and D. Rügamer.
Uncertainty Quantification for Prior-Fitted Networks using Martingale Posteriors.
AABI 2025 - 7th Symposium on Advances in Approximate Bayesian Inference collocated with the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 29, 2025. To be published. Preprint available. URL

Abstract

Prior-fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient method to construct Bayesian posteriors for such estimates based on Martingale Posteriors. Several simulated and real-world data examples are used to showcase the resulting uncertainty quantification of our method in inference applications.

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1767]

T. Rochussen and V. Fortuin.
Sparse Gaussian Neural Processes.
AABI 2025 - 7th Symposium on Advances in Approximate Bayesian Inference collocated with the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 29, 2025. To be published. Preprint available. arXiv

Abstract

Despite significant recent advances in probabilistic meta-learning, it is common for practitioners to avoid using deep learning models due to a comparative lack of interpretability. Instead, many practitioners simply use non-meta-models such as Gaussian processes with interpretable priors, and conduct the tedious procedure of training their model from scratch for each task they encounter. While this is justifiable for tasks with a limited number of data points, the cubic computational cost of exact Gaussian process inference renders this prohibitive when each task has many observations. To remedy this, we introduce a family of models that meta-learn sparse Gaussian process inference. Not only does this enable rapid prediction on new tasks with sparse Gaussian processes, but since our models have clear interpretations as members of the neural process family, it also allows manual elicitation of priors in a neural process for the first time. In meta-learning regimes for which the number of observed tasks is small or for which expert domain knowledge is available, this offers a crucial advantage.

MCML Authors

Vincent Fortuin

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Bayesian Deep Learning

[1766]

F. Ghorbanpour, V. Hangya and A. Fraser.
Fine-Grained Transfer Learning for Harmful Content Detection through Label-Specific Soft Prompt Tuning.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. DOI

Abstract

The spread of harmful content online is a dynamic issue evolving over time. Existing detection models, reliant on static data, are becoming less effective and generalizable. Developing new models requires sufficient up-to-date data, which is challenging. A potential solution is to combine existing datasets with minimal new data. However, detection tasks vary—some focus on hate speech, offensive, or abusive content, which differ in the intent to harm, while others focus on identifying targets of harmful speech such as racism, sexism, etc—raising the challenge of handling nuanced class differences. To address these issues, we introduce a novel transfer learning method that leverages class-specific knowledge to enhance harmful
content detection. In our approach, we first present label-specific soft prompt tuning, which captures and represents class-level information. Secondly, we propose two approaches to transfer this fine-grained knowledge from source (existing tasks) to target (unseen and new tasks): initializing the target task prompts from source prompts and using an attention mechanism that learns and adjusts attention scores to utilize the most relevant information from source prompts. Experiments demonstrate significant improvements in harmful content detection across English and German datasets, highlighting the effectiveness of label-specific representations and knowledge transfer.

MCML Authors

Faeze Ghorbanpour

Data Analytics & Statistics

Viktor Hangya

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

* Former Member

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

[1765]

K. Hämmerl, T. Limisiewicz, J. Libovický and A. Fraser.
Beyond Literal Token Overlap: Token Alignability for Multilinguality.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. DOI

Abstract

Previous work has considered token overlap, or even similarity of token distributions, as predictors for multilinguality and cross-lingual knowledge transfer in language models. However, these very literal metrics assign large distances to language pairs with different scripts, which can nevertheless show good cross-linguality. This limits the explanatory strength of token overlap for knowledge transfer between language pairs that use distinct scripts or follow different orthographic conventions. In this paper, we propose subword token alignability as a new way to understand the impact and quality of multilingual tokenisation. In particular, this metric predicts multilinguality much better when scripts are disparate and the overlap of literal tokens is low. We analyse this metric in the context of both encoder and decoder models, look at data size as a potential distractor, and discuss how this insight may be applied to multilingual tokenisation in future work. We recommend our subword token alignability metric for identifying optimal language pairs for cross-lingual transfer, as well as to guide the construction of better multilingual tokenisers in the future. We publish our code and reproducibility details.

MCML Authors

Katharina Hämmerl

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Data Analytics & Statistics

[1764]

C. Ma, A. ImaniGooghari, H. Ye, R. Pei, E. Asgari and H. Schütze.
Taxi1500: A Dataset for Multilingual Text Classification in 1500 Languages.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. URL

Abstract

While natural language processing tools have been developed extensively for some of the world’s languages, a significant portion of the world’s over 7000 languages are still neglected. One reason for this is that evaluation datasets do not yet cover a wide range of languages, including low-resource and endangered ones. We aim to address this issue by creating a text classification dataset encompassing a large number of languages, many of which currently have little to no annotated data available. We leverage parallel translations of the Bible to construct such a dataset by first developing applicable topics and employing a crowdsourcing tool to collect annotated data. By annotating the English side of the data and projecting the labels onto other languages through aligned verses, we generate text classification datasets for more than 1500 languages. We extensively benchmark several existing multilingual language models using our dataset. To facilitate the advancement of research in this area, we will release our dataset and code.

MCML Authors

Chunlan Ma

Computational Linguistics

Haotian Ye

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1763]

M. Spliethöver, T. Knebler, F. Fumagalli, M. Muschalik, B. Hammer, E. Hüllermeier and H. Wachsmuth.
Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. DOI

Abstract

Recent advances on instruction fine-tuning have led to the development of various prompting techniques for large language models, such as explicit reasoning steps. However, the success of techniques depends on various parameters, such as the task, language model, and context provided. Finding an effective prompt is, therefore, often a trial-and-error process. Most existing approaches to automatic prompting aim to optimize individual techniques instead of compositions of techniques and their dependence on the input. To fill this gap, we propose an adaptive prompting approach that predicts the optimal prompt composition ad-hoc for a given input. We apply our approach to social bias detection, a highly context-dependent task that requires semantic understanding. We evaluate it with three large language models on three datasets, comparing compositions to individual techniques and other baselines. The results underline the importance of finding an effective prompt composition. Our approach robustly ensures high detection performance, and is best in several settings. Moreover, first experiments on other tasks support its generalizability.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1762]

L. Madaan, D. Esiobu, P. Stenetorp, B. Plank and D. Hupkes.
Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. To be published. Preprint available. arXiv

Abstract

In the recent past, a popular way of evaluating natural language understanding (NLU), was to consider a model’s ability to perform natural language inference (NLI) tasks. In this paper, we investigate if NLI tasks, that are rarely used for LLM evaluation, can still be informative for evaluating LLMs. Focusing on five different NLI benchmarks across six models of different scales, we investigate if they are able to discriminate models of different size and quality and how their accuracies develop during training. Furthermore, we investigate the extent to which the softmax distributions of models align with human distributions in cases where statements are ambiguous or vague. Overall, our results paint a positive picture for the NLI tasks: we find that they are able to discriminate well between models at various stages of training, yet are not (all) saturated. Furthermore, we find that while the similarity of model distributions with human label distributions increases with scale, it is still much higher than the similarity between two populations of humans, making it a potentially interesting statistic to consider.

MCML Authors

Barbara Plank

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

AI and Computational Linguistics

[1761]

M. Schöffel, M. Wiedner, E. Garces Arias, P. Ruppert, C. Heumann and M. Aßenmacher.
Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. To be published. Preprint available. arXiv

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing, yet their effectiveness in handling historical languages remains largely unexplored. This study examines the performance of open-source LLMs in part-of-speech (POS) tagging for Old Occitan, a historical language characterized by non-standardized orthography and significant diachronic variation. Through comparative analysis of two distinct corpora-hagiographical and medical texts-we evaluate how current models handle the inherent challenges of processing a low-resource historical language. Our findings demonstrate critical limitations in LLM performance when confronted with extreme orthographic and syntactic variability. We provide detailed error analysis and specific recommendations for improving model performance in historical language processing. This research advances our understanding of LLM capabilities in challenging linguistic contexts while offering practical insights for both computational linguistics and historical language studies.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1760]

R. Shim and B. Plank.
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. To be published. Preprint available.

Abstract

There is increasing interest in looking at dialects in NLP. However, most work to date still treats dialects as discrete categories. For instance, evaluative work in variation-oriented NLP for English often works with Indian English or African-American Venacular English as homogeneous categories (Faisal et al., 2024; Ziems et al., 2023), yet even within one variety there is substantial variation. We examine within-dialect variation and show that performance critically varies within categories. We measure speech-to-text performance on Italian dialects, and empirically observe a geographical performance disparity. This disparity correlates substantially (-0.5) with linguistic similarity to the highest performing dialect variety. We cross-examine our results against dialectometry methods, and interpret the performance disparity to be due to a bias towards dialects that are more similar to the standard variety in the speech-to-text model examined. We additionally leverage geostatistical methods to predict zero-shot performance at unseen sites, and find the incorporation of geographical information to substantially improve prediction performance, indicating there to be geographical structure in the performance distribution.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[1759]

P. Lin, A. F. T. Martins and H. Schütze.
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models.
NAACL 2025 - Findings of Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. DOI

Abstract

Recent studies have highlighted the potential of exploiting parallel corpora to enhance multilingual large language models, improving performance in both bilingual tasks, e.g., machine translation, and general-purpose tasks, e.g., text classification. Building upon these findings, our comprehensive study aims to identify the most effective strategies for leveraging parallel corpora. We investigate the impact of parallel corpora quality and quantity, training objectives, and model size on the performance of multilingual large language models enhanced with parallel corpora across diverse languages and tasks. Our analysis reveals several key insights: (i) filtering noisy translations is essential for effectively exploiting parallel corpora, while language identification and short sentence filtering have little effect; (ii) even a corpus containing just 10K parallel sentences can yield results comparable to those obtained from much larger datasets; (iii) employing only the machine translation objective yields the best results among various training objectives and their combinations; (iv) larger multilingual language models benefit more from parallel corpora than smaller models due to their stronger capacity for cross-task transfer. Our study offers valuable insights into the optimal utilization of parallel corpora to enhance multilingual large language models, extending the generalizability of previous findings from limited languages and tasks to a broader range of scenarios.

MCML Authors

Peiqin Lin

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1758]

P. Lin, A. F. T. Martins and H. Schütze.
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples.
NAACL 2025 - Findings of Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. DOI GitHub

Abstract

Recent studies indicate that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving relevant in-context examples tailored to the input query, enhances few-shot in-context learning of English. However, adapting these methods to other languages, especially low-resource ones, poses challenges due to the scarcity of cross-lingual retrievers and annotated data. Thus, we introduce XAMPLER: Cross-Lingual Example Retrieval, a method tailored to tackle the challenge of cross-lingual in-context learning using only annotated English data. XAMPLER first trains a retriever based on Glot500, a multilingual small language model, using positive and negative English examples constructed from the predictions of a multilingual large language model, i.e., MaLA500. Leveraging the cross-lingual capacity of the retriever, it can directly retrieve English examples as few-shot examples for in-context learning of target languages. Experiments on the multilingual text classification benchmark SIB200 with 176 languages show that XAMPLER substantially improves the in-context learning performance across languages.

MCML Authors

Peiqin Lin

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1757]

I. d. S. Bueno Júnior, H. Ye, A. Wisiorek and H. Schütze.
Privacy-Preserving Federated Learning for Hate Speech Detection.
SRW @NAACL 2025 - Student Research Workshop at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025). Albuquerque, NM, USA, Apr 29-May 04, 2025. DOI

Abstract

This paper presents a federated learning system with differential privacy for hate speech detection, tailored to low-resource languages. By fine-tuning pre-trained language models, ALBERT emerged as the most effective option for balancing performance and privacy. Experiments demonstrated that federated learning with differential privacy performs adequately in low-resource settings, though datasets with fewer than 20 sentences per client struggled due to excessive noise. Balanced datasets and augmenting hateful data with non-hateful examples proved critical for improving model utility. These findings offer a scalable and privacy-conscious framework for integrating hate speech detection into social media platforms and browsers, safeguarding user privacy while addressing online harm.

MCML Authors

Haotian Ye

Computational Linguistics

Axel Wisiorek

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Computational Linguistics

[1756]

B. Ma, C. A. Huang and A.-C. Haensch.
Can Large Language Models Advance Crosswalks? The Case of Danish Occupation Codes.
SRW @NAACL 2025 - Student Research Workshop at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025). Albuquerque, NM, USA, Apr 29-May 04, 2025. URL

Abstract

Crosswalks, which map one classification system to another, are critical tools for harmonizing data across time, countries, or frameworks. However, constructing crosswalks is labor-intensive and often requires domain expertise. This paper investigates the potential of Large Language Models (LLMs) to assist in creating crosswalks, focusing on two Danish occupational classification systems from different time periods as a case study. We propose a two-stage, prompt-based framework for this task, where LLMs perform similarity assessments between classification codes and identify final mappings through a guided decision process. Using four instruction-tuned LLMs and comparing them against an embedding-based baseline, we evaluate the performance of different models in crosswalks. Our results highlight the strengths of LLMs in crosswalk creation compared to the embedding-based baseline, showing the effectiveness of the interactive prompt-based framework for conducting crosswalks by LLMs. Furthermore, we analyze the impact of model combinations across two interactive rounds, highlighting the importance of model selection and consistency. This work contributes to the growing field of NLP applications for domain-specific knowledge mapping and demonstrates the potential of LLMs in advancing crosswalk methodologies.

MCML Authors

Bolei Ma

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[1755]

D. Geißler, A. Maarouf and S. Feuerriegel.
Analyzing User Characteristics of Hate Speech Spreaders on Social Media.
WWW 2025 - ACM Web Conference. Sydney, Australia, Apr 28-May 02, 2025. To be published. Preprint available. arXiv

Abstract

Hate speech on social media threatens the mental and physical well-being of individuals and contributes to real-world violence. Resharing is an important driver behind the spread of hate speech on social media. Yet, little is known about who reshares hate speech and what their characteristics are. In this paper, we analyze the role of user characteristics in hate speech resharing across different types of hate speech (e.g., political hate). For this, we proceed as follows: First, we cluster hate speech posts using large language models to identify different types of hate speech. Then we model the effects of user attributes on users’ probability to reshare hate speech using an explainable machine learning model. To do so, we apply debiasing to control for selection bias in our observational social media data and further control for the latent vulnerability of users to hate speech. We find that, all else equal, users with fewer followers, fewer friends, fewer posts, and older accounts share more hate speech. This shows that users with little social influence tend to share more hate speech. Further, we find substantial heterogeneity across different types of hate speech. For example, racist and misogynistic hate is spread mostly by users with little social influence. In contrast, political anti-Trump and anti-right-wing hate is reshared by users with larger social influence. Overall, understanding the factors that drive users to share hate speech is crucial for detecting individuals at risk of engaging in harmful behavior and for designing effective mitigation strategies.

MCML Authors

Abdurahman Maarouf

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1754]

J. de Ruite, N. Sairam, A. Camero, K. Rafiezadeh Shahi, X. Zhu, M. W. Smith and H. Kreibich.
The complex connection between flood risk and malaria dynamics in Sub-Saharan Africa.
EGU 2025 - General Assembly of the European Geosciences Union. Vienna, Austria, Apr 27-May 02, 2025. DOI

Abstract

Climate change projections for 2030 indicate a concerning increase in the frequency of floods, which is expected to result in significant economic damages and losses on a global scale. The growth of urbanization has indeed increased flood risk, highlighting the need for a prompt evaluation of economic losses to facilitate rapid response and effective reconstruction. However, providing timely and accurate economic damage assessment immediately after a flood event is difficult and associated with high uncertainty. Remote sensing data can support this task, but challenges such as cloud cover, infrequent return times from satellites, and the lack of ground truth data make supervised approaches challenging. To address these challenges, we propose a new economic damage assessment approach based on the analysis of multi-temporal and multi-source, Synthetic Aperture Radar (SAR) images before and after the flood peak with an unsupervised change detection method. This method utilizes computer vision techniques, specifically a pixel-based approach with SAR data (Sentinel-1 and TerraSAR-X/TanDEM-X) to monitor changes in buildings and the flood extension. It employs various threshold techniques and parameters to determine the optimal threshold values for highlighting changes and the presence of water. By using this method, our aim is to obtain an economic model based on pixels, which represents the volume of water surrounding or on each building and the flood extension. The purpose of this study is to support governments in decision-making processes and enable insurers to efficiently assess and compensate for damages caused by flood events.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1753]

S. Delgado Rodriguez, M. Windl, F. Alt and K. Marky.
The TaPSI Research Framework - A Systematization of Knowledge on Tangible Privacy and Security Interfaces.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

This paper presents a comprehensive Systematization of Knowledge on tangible privacy and security interfaces (TaPSI). Tangible interfaces provide physical forms for digital interactions. They can offer significant benefits for privacy and security applications by making complex and abstract security concepts more intuitive, comprehensible, and engaging. Through a literature survey, we collected and analyzed 80 publications. We identified terminology used in these publications and addressed usable privacy and security domains, contributions, applied methods, implementation details, and opportunities or challenges inherent to TaPSI. Based on our findings, we define TaPSI and propose the TaPSI Research Framework, which guides future research by offering insights into when and how to conduct research on privacy and security involving TaPSI as well as a design space of TaPSI.

MCML Authors

Maximiliane Windl

C4 | Computational Social Sciences
→ Group Christoph Kern

Human-Centered Ubiquitous Media

[1752]

T. Mitrevska, F. Chiossi and S. Mayer.
ERP Markers of Visual and Semantic Processing in AI-Generated Images: From Perception to Meaning.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

Perceptual similarity assessment plays an important role in processing visual information, which is often employed in Human-AI interaction tasks such as object recognition or content generation. It is important to understand how humans perceive and evaluate visual similarity to iteratively generate outputs that meet the users’ expectations better and better. By leveraging physiological signals, systems can rely on users’ EEG responses to support the similarity assessment process. We conducted a study (N=20), presenting diverse AI-generated images as stimuli and evaluating their semantic similarity to a target image while recording event-related potentials (ERPs). Our results show that the N400 component distinguishes low, medium, and high similarity of images, while the P2 component showed no significant impact, implying consistent early perceptual processing. Thus, we demonstrate that ERPs allow us to assess the users’ perceived visual similarity to support rapid interactions with human-AI systems.

MCML Authors

Sven Mayer

Prof. Dr.

* Former Member

[1751]

J. Simson, F. Draxler, S. Mehr and C. Kern.
Preventing Harmful Data Practices by Using Participatory Input to Navigate the Machine Learning Multiverse.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

In light of inherent trade-offs regarding fairness, privacy, interpretability and performance, as well as normative questions, the machine learning (ML) pipeline needs to be made accessible for public input, critical reflection and engagement of diverse stakeholders. In this work, we introduce a participatory approach to gather
input from the general public on the design of an ML pipeline. We show how people’s input can be used to navigate and constrain the multiverse of decisions during both model development and evaluation. We highlight that central design decisions should be democratized rather than “optimized” to acknowledge their critical impact on the system’s output downstream. We describe the iterative development of our approach and its exemplary implementation on a citizen science platform. Our results demonstrate how public participation can inform critical design decisions along the model-building pipeline and combat widespread lazy data practices.

MCML Authors

Jan Simson

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[1750]

M. Windl, R. Amberg and T. Kosch.
The Illusion of Privacy: Investigating User Misperceptions in Browser Tracking Protection.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

Third parties track users’ web browsing activities, raising privacy concerns. Tracking protection extensions prevent this, but their influence on privacy protection beliefs shaped by narratives remains uncertain. This paper investigates users’ misperception of tracking protection offered by browser plugins. Our study explores how different narratives influence users’ perceived privacy protection by examining three tracking protection extension narratives: no protection, functional protection, and a placebo. In a study (N=36), participants evaluated their anticipated protection during a hotel
booking process, influenced by the narrative about the plugin’s functionality. However, participants viewed the same website without tracking protection adaptations. We show that users feel more protected when informed they use a functional or placebo extension, compared to no protection. Our findings highlight the deceptive nature of misleading privacy tools, emphasizing the need for greater transparency to prevent users from a false sense of protection, as such misleading tools negatively affect user study results.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[1749]

M. Windl, P. Thalhammer, D. Müller, A. Schmidt and S. S. Feger.
PrivacyHub: A Functional Tangible and Digital Ecosystem for Interoperable Smart Home Privacy Awareness and Control.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

Hubs are at the core of most smart homes. Modern cross-ecosystem protocols and standards enable smart home hubs to achieve interoperability across devices, offering the unique opportunity to integrate universally available smart home privacy awareness and control features. To date, such privacy features mainly focus on individual products or prototypical research artifacts. We developed a cross-ecosystem hub featuring a tangible dashboard and a digital web application to deepen our understanding of how smart home users interact with functional privacy features. The ecosystem allows users to control the connectivity states of their devices and raises awareness by visualizing device positions, states, and data flows. We deployed the ecosystem in six households for one week and found that it increased participants’ perceived control, awareness, and understanding of smart home privacy. We further found distinct differences between tangible and digital mechanisms. Our findings highlight the value of cross-ecosystem hubs for effective privacy management.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[1748]

J. Leusmann, A. Belardinelli, L. Haliburton, S. Hasler, A. Schmidt, S. Mayer, M. Gienger and C. Wang.
Investigating LLM-Driven Curiosity in Human-Robot Interaction.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. To be published. DOI

Abstract

Integrating curious behavior traits into robots is essential for them to learn and adapt to new tasks over their lifetime and to enhance human-robot interaction. However, the effects of robots expressing curiosity on user perception, user interaction, and user experience in collaborative tasks are unclear. In this work, we present a Multimodal Large Language Model-based system that equips a robot with non-verbal and verbal curiosity traits. We conducted a user study (N=20) to investigate how these traits modulate the robot’s behavior and the users’ impressions of sociability and quality of interaction. Participants prepared cocktails or pizzas with a robot, which was either curious or non-curious. Our results show that we could create user-centric curiosity, which users perceived as more human-like, inquisitive, and autonomous while resulting in a longer interaction time. We contribute a set of design recommendations allowing system designers to take advantage of curiosity in collaborative tasks.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[1747]

M. Windl, P. Z. Laboda and S. Mayer.
Designing Effective Consent Mechanisms for Spontaneous Interactions in Augmented Reality.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. To be published. DOI

Abstract

Ubiquitous computing devices like Augmented Reality (AR) glasses allow countless spontaneous interactions – all serving different goals. AR devices rely on data transfer to personalize recommendations and adapt to the user. Today’s consent mechanisms, such as privacy policies, are suitable for long-lasting interactions; however, how users can consent to fast, spontaneous interactions is unclear. We first conducted two focus groups (N=17) to identify privacy-relevant scenarios in AR. We then conducted expert interviews (N=11) with co-design activities to establish effective consent mechanisms. Based on that, we contribute (1) a validated scenario taxonomy to define privacy-relevant AR interaction scenarios, (2) a flowchart to decide on the type of mechanisms considering contextual factors, (3) a design continuum and design aspects chart to create the mechanisms, and (4) a trade-off and prediction chart to evaluate the mechanism. Thus, we contribute a conceptual framework fostering a privacy-preserving future with AR.

MCML Authors

Maximiliane Windl

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Human-Centered Ubiquitous Media

[1746]

K. Forster, V. Wagner, L. Keil, M. A. Müller, T. Sellhorn and S. Feuerriegel.
Tracking ESG Disclosures of European Companies with Retrieval-Augmented Generation.
Climate Change AI @ICLR 2025 - Workshop on Tackling Climate Change with Machine Learning at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published.

Abstract

Corporations play a crucial role in mitigating climate change and accelerating progress toward environmental, social, and governance (ESG) objectives. However, structured information on the current state of corporate ESG efforts remains limited. In this paper, we propose a machine learning framework based on a retrieval-augmented generation (RAG) pipeline to track ESG indicators from N = 9, 200 corporate reports. Our analysis includes ESG indicators from 600 of the largest listed corporations in Europe between 2014 and 2023. We focus on two key dimensions: first, we identify gaps in corporate sustainability reporting in light of existing standards. Second, we provide comprehensive bottom-up estimates of key ESG indicators across European industries. Our findings enable policymakers and financial markets to effectively assess corporate ESG transparency and track progress toward global sustainability objectives.

MCML Authors

Kerstin Forster

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Artificial Intelligence in Management

[1745]

C. Bülte, S. Maskey, P. Scholl, J. von Berg and G. Kutyniok.
Graph Neural Networks for Enhancing Ensemble Forecasts of Extreme Rainfall.
Climate Change AI @ICLR 2025 - Workshop on Tackling Climate Change with Machine Learning at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Climate change is increasing the occurrence of extreme precipitation events, threatening infrastructure, agriculture, and public safety. Ensemble prediction systems provide probabilistic forecasts but exhibit biases and difficulties in capturing extreme weather. While post-processing techniques aim to enhance forecast accuracy, they rarely focus on precipitation, which exhibits complex spatial dependencies and tail behavior. Our novel framework leverages graph neural networks to post-process ensemble forecasts, specifically modeling the extremes of the underlying distribution. This allows to capture spatial dependencies and improves forecast accuracy for extreme events, thus leading to more reliable forecasts and mitigating risks of extreme precipitation and flooding.

MCML Authors

Christopher Bülte

Mathematical Foundations of Artificial Intelligence

Sohir Maskey

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Philipp Scholl

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Jonas von Berg

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Mathematical Foundations of Artificial Intelligence

[1744]

J. Kobialka, E. Sommer, J. Kwon, D. Dold and D. Rügamer.
Approximate Posteriors in Neural Networks: A Sampling Perspective.
FPI @ICLR 2025 - Workshop on Frontiers in Probabilistic Inference: Learning meets Sampling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. Spotlight Talk. To be published. Preprint available. URL

Abstract

MCML Authors

Julius Kobialka

Statistics, Data Science and Machine Learning

Emanuel Sommer

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1743]

T. Nagler and D. Rügamer.
Uncertainty Quantification for Prior-Fitted Networks using Martingale Posteriors.
FPI @ICLR 2025 - Workshop on Frontiers in Probabilistic Inference: Learning meets Sampling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1742]

A. Reuter, T. G. J. Rudner, V. Fortuin and D. Rügamer.
Can Transformers Learn Full Bayesian Inference in Context?
FPI @ICLR 2025 - Workshop on Frontiers in Probabilistic Inference: Learning meets Sampling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Matthias Feurer

Statistics, Data Science and Machine Learning

[1741]

D. Rundel, E. Sommer, B. Bischl, D. Rügamer and M. Feurer.
Efficiently Warmstarting MCMC for BNNS.
FPI @ICLR 2025 - Workshop on Frontiers in Probabilistic Inference: Learning meets Sampling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Markov Chain Monte Carlo (MCMC) algorithms are widely regarded as the gold standard for approximate inference in Bayesian neural networks (BNNs). However, they remain computationally expensive and prone to inefficiencies, such
as dying samplers, frequently leading to substantial waste of computational resources. While prior work has presented warmstarting techniques as an effective method to mitigate these inefficiencies, we provide a more comprehensive empirical analysis of how initializations of samplers affect their behavior. Based on various experiments examining the dynamics of warmstarting MCMC, we propose novel warmstarting strategies that leverage performance predictors and adaptive termination criteria to achieve better-performing, yet more cost-efficient, models. In numerical experiments, we demonstrate that this approach provides a practical pathway to more resource-efficient approximate inference in BNNs.

MCML Authors

David Rundel

Statistical Learning and Data Science

Emanuel Sommer

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

[1740]

C. Koke, Y. Shen, A. Saroha, M. Eisenberger, B. Rieck, M. M. B. Michael M. Bronstein and D. Cremers.
Graph Networks struggle with variable Scale.
ICBINB @ICLR 2025 - Workshop I Can’t Believe It’s Not Better: Challenges in Applied Deep Learning at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Standard graph neural networks assign vastly different latent embeddings to graphs describing the same object at different resolution scales. This precludes consistency in applications and prevents generalization between scales as would fundamentally be needed e.g. in AI4Science. We uncover the underlying obstruction, investigate its origin and show how to overcome it by modifying the message passing paradigm.

MCML Authors

Christian Koke

Computer Vision & Artificial Intelligence

Yuesong Shen

* Former Member

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[1739]

H. Baniecki, G. Casalicchio, B. Bischl and P. Biecek.
Efficient and Accurate Explanation Estimation with Distribution Compression.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. Spotlight Presentation. To be published. Preprint available. arXiv

Abstract

We discover a theoretical connection between explanation estimation and distribution compression that significantly improves the approximation of feature attributions, importance, and effects. While the exact computation of various machine learning explanations requires numerous model inferences and becomes impractical, the computational cost of approximation increases with an ever-increasing size of data and model parameters. We show that the standard i.i.d. sampling used in a broad spectrum of algorithms for post-hoc explanation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm of sample-efficient explainability. It relies on distribution compression through kernel thinning to obtain a data sample that best approximates its marginal distribution. CTE significantly improves the accuracy and stability of explanation estimation with negligible computational overhead. It often achieves an on-par explanation approximation error 2-3x faster by using fewer samples, i.e. requiring 2-3x fewer model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.

MCML Authors

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A3 | Computational Models
→ Group Stefanie Jegelka

Statistical Learning and Data Science

[1738]

D. Herbst and S. Jegelka.
Higher-Order Graphon Neural Networks: Approximation and Cut Distance.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. Spotlight Presentation. To be published. Preprint available. URL

Abstract

Graph limit models, like graphons for limits of dense graphs, have recently been used to study size transferability of graph neural networks (GNNs). While most literature focuses on message passing GNNs (MPNNs), in this work we attend to the more powerful higher-order GNNs. First, we extend the -WL test for graphons (Böker, 2023) to the graphon-signal space and introduce signal-weighted homomorphism densities as a key tool. As an exemplary focus, we generalize Invariant Graph Networks (IGNs) to graphons, proposing Invariant Graphon Networks (IWNs) defined via a subset of the IGN basis corresponding to bounded linear operators. Even with this restricted basis, we show that IWNs of order are at least as powerful as the -WL test, and we establish universal approximation results for graphon-signals in distances. This significantly extends the prior work of Cai & Wang (2022), showing that IWNs—a subset of their IGN-small—retain effectively the same expressivity as the full IGN basis in the limit. In contrast to their approach, our blueprint of IWNs also aligns better with the geometry of graphon space, for example facilitating comparability to MPNNs. We highlight that, while typical higher-order GNNs are discontinuous w.r.t. cut distance—which causes their lack of convergence and is inherently tied to the definition of -WL—their transferability remains comparable to MPNNs.

MCML Authors

Daniel Herbst

Foundations of Deep Neural Networks

Stefanie Jegelka

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Foundations of Deep Neural Networks

[1737]

M. Sabanayagam, L. Gosch, S. Günnemann and D. Ghoshdastidar.
Exact Certification of (Graph) Neural Networks Against Label Poisoning.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. Spotlight. To be published. Preprint available. URL

Abstract

Machine learning models are highly vulnerable to label flipping, i.e., the adversarial modification (poisoning) of training labels to compromise performance. Thus, deriving robustness certificates is important to guarantee that test predictions remain unaffected and to understand worst-case robustness behavior. However, for Graph Neural Networks (GNNs), the problem of certifying label flipping has so far been unsolved. We change this by introducing an exact certification method, deriving both sample-wise and collective certificates. Our method leverages the Neural Tangent Kernel (NTK) to capture the training dynamics of wide networks enabling us to reformulate the bilevel optimization problem representing label flipping into a Mixed-Integer Linear Program (MILP). We apply our method to certify a broad range of GNN architectures in node classification tasks. Thereby, concerning the worst-case robustness to label flipping: (i) we establish hierarchies of GNNs on different benchmark graphs; (ii) quantify the effect of architectural choices such as activations, depth and skip-connections; and surprisingly, (iii) uncover a novel phenomenon of the robustness plateauing for intermediate perturbation budgets across all investigated datasets and architectures. While we focus on GNNs, our certificates are applicable to sufficiently wide NNs in general through their NTK. Thus, our work presents the first exact certificate to a poisoning attack ever derived for neural networks, which could be of independent interest.

MCML Authors

Lukas Gosch

Data Analytics & Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Debarghya Ghoshdastidar

Prof. Dr.

Theoretical Foundations of Artificial Intelligence

[1736]

E. Abdelrahman, L. Zhao, V. T. Hu, M. Cord, P. Perez and M. Elhoseiny.
ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Diffusion models break down the challenging task of generating data from high-dimensional distributions into a series of easier denoising steps. Inspired by this paradigm, we propose a novel approach that extends the diffusion framework into modality space, decomposing the complex task of RGB image generation into simpler, interpretable stages. Our method, termed ToddlerDiffusion, cascades modality-specific models, each responsible for generating an intermediate representation, such as contours, palettes, and detailed textures, ultimately culminating in a high-quality RGB image. Instead of relying on the naive LDM concatenation conditioning mechanism to connect the different stages together, we employ Schrödinger Bridge to determine the optimal transport between different modalities. Although employing a cascaded pipeline introduces more stages, which could lead to a more complex architecture, each stage is meticulously formulated for efficiency and accuracy, surpassing Stable-Diffusion (LDM) performance. Modality composition not only enhances overall performance but enables emerging proprieties such as consistent editing, interaction capabilities, high-level interpretability, and faster convergence and sampling rate. Extensive experiments on diverse datasets, including LSUN-Churches, ImageNet, CelebHQ, and LAION-Art, demonstrate the efficacy of our approach, consistently outperforming state-of-the-art methods. For instance, ToddlerDiffusion achieves notable efficiency, matching LDM performance on LSUN-Churches while operating 2× faster with a 3× smaller architecture.

MCML Authors

Vincent Tao Hu

Dr.

B1 | Computer Vision
→ Group Nils Thuerey

Computer Vision & Learning

[1735]

K. Bhatia, F. Köhler and N. Thuerey.
PRDP: Progressively Refined Differentiable Physics.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

The physics solvers employed for neural network training are primarily iterative, and hence, differentiating through them introduces a severe computational burden as iterations grow large. Inspired by works in bilevel optimization, we show that full accuracy of the network is achievable through physics significantly coarser than fully converged solvers. We propose Progressively Refined Differentiable Physics (PRDP), an approach that identifies the level of physics refinement sufficient for full training accuracy. By beginning with coarse physics, adaptively refining it during training, and stopping refinement at the level adequate for training, it enables significant compute savings without sacrificing network accuracy. Our focus is on differentiating iterative linear solvers for sparsely discretized differential operators, which are fundamental to scientific computing. PRDP is applicable to both unrolled and implicit differentiation. We validate its performance on a variety of learning scenarios involving differentiable physics solvers such as inverse problems, autoregressive neural emulators, and correction-based neural-hybrid solvers. In the challenging example of emulating the Navier-Stokes equations, we reduce training time by 62%.

MCML Authors

Felix Köhler

Physics-based Simulation

Nils Thuerey

Prof. Dr.

Physics-based Simulation

[1734]

M. Bini, L. Girrbach and Z. Akata.
Decoupling Angles and Strength in Low-rank Adaptation.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Parameter Efficient FineTuning (PEFT) methods have recently gained extreme popularity thanks to the vast availability of large-scale models, allowing to quickly adapt pretrained models to downstream tasks with minimal computational costs. However, current additive finetuning methods such as LoRA show low robustness to prolonged training and hyperparameter choices, not allowing for optimal out-of-the-box usage. On the other hand, multiplicative and bounded approaches such as ETHER, even if providing higher robustness, only allow for extremely low-rank adaptations and are limited to a fixed-strength transformation, hindering the expressive power of the adaptation. In this work, we propose the DeLoRA finetuning method that first normalizes and then scales the learnable low-rank matrices, thus effectively bounding the transformation strength, which leads to increased hyperparameter robustness at no cost in performance. We show that this proposed approach effectively and consistently improves over popular PEFT methods by evaluating our method on two finetuning tasks, subject-driven image generation and LLM instruction tuning.

MCML Authors

Massimo Bini

Interpretable and Reliable Machine Learning

Leander Girrbach

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1733]

Q. Bouniot, P. Mozharovskyi and F. d'Alché-Buc.
Tailoring Mixup to Data for Calibration.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved predictive performance, Mixup is also a good technique for improving calibration. However, mixing data carelessly can lead to manifold mismatch, i.e., synthetic data lying outside original class manifolds, which can deteriorate calibration. In this work, we show that the likelihood of assigning a wrong label with mixup increases with the distance between data to mix. To this end, we propose to dynamically change the underlying distributions of interpolation coefficients depending on the similarity between samples to mix, and define a flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves predictive performance and calibration of models, while being much more efficient.

MCML Authors

Quentin Bouniot

Dr.

Interpretable and Reliable Machine Learning

[1732]

S. Dahan, G. Bénédict, L. Z. J. Williams, Y. Guo, D. Rückert, R. Leech and E. C. Robinson.
SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Current AI frameworks for brain decoding and encoding, typically train and test models within the same datasets. This limits their utility for brain computer interfaces (BCI) or neurofeedback, for which it would be useful to pool experiences across individuals to better simulate stimuli not sampled during training. A key obstacle to model generalisation is the degree of variability of inter-subject cortical organisation, which makes it difficult to align or compare cortical signals across participants. In this paper we address this through the use of surface vision transformers, which build a generalisable model of cortical functional dynamics, through encoding the topography of cortical networks and their interactions as a moving image across a surface. This is then combined with tri-modal self-supervised contrastive (CLIP) alignment of audio, video, and fMRI modalities to enable the retrieval of visual and auditory stimuli from patterns of cortical activity (and vice-versa). We validate our approach on 7T task-fMRI data from 174 healthy participants engaged in the movie-watching experiment from the Human Connectome Project (HCP). Results show that it is possible to detect which movie clips an individual is watching purely from their brain activity, even for individuals and movies not seen during training. Further analysis of attention maps reveals that our model captures individual patterns of brain activity that reflect semantic and visual systems. This opens the door to future personalised simulations of brain function.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1731]

L. Fang, Y. Wang, Z. Liu, C. Zhang, S. Jegelka, J. Gao, B. Ding and Y. Wang.
What is Wrong with Perplexity for Long-context Language Modeling?
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL GitHub

Abstract

Handling long-context inputs is crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning. While recent approaches have extended the context windows of LLMs and employed perplexity (PPL) as a standard evaluation metric, PPL has proven unreliable for assessing long-context capabilities. The underlying cause of this limitation has remained unclear. In this work, we provide a comprehensive explanation for this issue. We find that PPL overlooks key tokens, which are essential for long-context understanding, by averaging across all tokens and thereby obscuring the true performance of models in long-context scenarios. To address this, we propose textbf{LongPPL}, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them. Our experiments demonstrate that LongPPL strongly correlates with performance on various long-context benchmarks (e.g., Pearson correlation of -0.96), significantly outperforming traditional PPL in predictive accuracy. Additionally, we introduce textbf{LongCE} (Long-context Cross-Entropy) loss, a re-weighting strategy for fine-tuning that prioritizes key tokens, leading to consistent improvements across diverse benchmarks. In summary, these contributions offer deeper insights into the limitations of PPL and present effective solutions for accurately evaluating and enhancing the long-context capabilities of LLMs.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1730]

A. Findeis, T. Kaufmann, E. Hüllermeier, S. Albanie and R. D. Mullins.
Inverse Constitutional AI: Compressing Preferences into Principles.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL GitHub

Abstract

Feedback data is widely used for fine-tuning and evaluating state-of-the-art AI models. Pairwise text preferences, where human or AI annotators select the “better” of two options, are particularly common. Such preferences are used to train (reward) models or to rank models with aggregate statistics. For many applications it is desirable to understand annotator preferences in addition to modelling them – not least because extensive prior work has shown various unintended biases in preference datasets. Yet, preference datasets remain challenging to interpret. Neither black-box reward models nor statistics can answer why one text is preferred over another. Manual interpretation of the numerous (long) response pairs is usually equally infeasible. In this paper, we introduce the Inverse Constitutional AI (ICAI) problem, formulating the interpretation of pairwise text preference data as a compression task. In constitutional AI, a set of principles (a constitution) is used to provide feedback and fine-tune AI models. ICAI inverts this process: given a feedback dataset, we aim to extract a constitution that best enables a large language model (LLM) to reconstruct the original annotations. We propose a corresponding ICAI algorithm and validate its generated constitutions quantitatively based on annotation reconstruction accuracy on several datasets: (a) synthetic feedback data with known principles; (b) AlpacaEval cross-annotated human feedback data; (c) crowdsourced Chatbot Arena data; and (d) PRISM data from diverse demographic groups. As an example application, we further demonstrate the detection of biases in human feedback data. As a short and interpretable representation of the original dataset, generated constitutions have many potential use cases: they may help identify undesirable annotator biases, better understand model performance, scale feedback to unseen data, or assist with adapting AI models to individual user or group preferences.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence and Machine Learning

[1729]

D. Frauen, K. Heß and S. Feuerriegel.
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Estimating heterogeneous treatment effects (HTEs) over time is crucial in many disciplines such as personalized medicine. For example, electronic health records are commonly collected over several time periods and then used to personalize treatment decisions. Existing works for this task have mostly focused on model-based learners (i.e., learners that adapt specific machine-learning models). In contrast, model-agnostic learners – so-called meta-learners – are largely unexplored. In our paper, we propose several meta-learners that are model-agnostic and thus can be used in combination with arbitrary machine learning models (e.g., transformers) to estimate HTEs over time. Here, our focus is on learners that can be obtained via weighted pseudo-outcome regressions, which allows for efficient estimation by targeting the treatment effect directly. We then provide a comprehensive theoretical analysis that characterizes the different learners and that allows us to offer insights into when specific learners are preferable. Finally, we confirm our theoretical insights through numerical experiments. In sum, while meta-learners are already state-of-the-art for the static setting, we are the first to propose a comprehensive set of meta-learners for estimating HTEs in the time-varying setting.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1728]

L. Girrbach, Y. Huang, S. Alaniz, T. Darrell and Z. Akata.
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs).
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Pre-trained large language models (LLMs) have been reliably integrated with visual input for multimodal tasks. The widespread adoption of instruction-tuned image-to-text vision-language assistants (VLAs) like LLaVA and InternVL necessitates evaluating gender biases. We study gender bias in 22 popular open-source VLAs with respect to personality traits, skills, and occupations. Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. Similarly, they tend to attribute more skills and positive personality traits to women than to men, and we see a consistent tendency to associate negative personality traits with men. To eliminate the gender bias in these models, we find that finetuning-based debiasing methods achieve the best tradeoff between debiasing and retaining performance on downstream tasks. We argue for pre-deploying gender bias assessment in VLAs and motivate further development of debiasing strategies to ensure equitable societal outcomes.

MCML Authors

Leander Girrbach

Interpretable and Reliable Machine Learning

Yiran Huang

Interpretable and Reliable Machine Learning

Stephan Alaniz

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Interpretable and Reliable Machine Learning

[1727]

K. Heß and S. Feuerriegel.
Stabilized Neural Prediction of Potential Outcomes in Continuous Time.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Patient trajectories from electronic health records are widely used to predict potential outcomes of treatments over time, which then allows to personalize care. Yet, existing neural methods for this purpose have a key limitation: while some adjust for time-varying confounding, these methods assume that the time series are recorded in discrete time. In other words, they are constrained to settings where measurements and treatments are conducted at fixed time steps, even though this is unrealistic in medical practice. In this work, we aim to predict potential outcomes in continuous time. The latter is of direct practical relevance because it allows for modeling patient trajectories where measurements and treatments take place at arbitrary, irregular timestamps. We thus propose a new method called stabilized continuous time inverse propensity network (SCIP-Net). For this, we further derive stabilized inverse propensity weights for robust prediction of the potential outcomes. To the best of our knowledge, our SCIP-Net is the first neural method that performs proper adjustments for time-varying confounding in continuous time.

MCML Authors

Konstantin Heß

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1726]

J. Kaiser, K. Schwethelm, D. Rückert and G. Kaissis.
Laplace Sample Information: Data Informativeness Through a Bayesian Lens.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Accurately estimating the informativeness of individual samples in a dataset is an important objective in deep learning, as it can guide sample selection, which can improve model efficiency and accuracy by removing redundant or potentially harmful samples. We propose Laplace Sample Information (LSI) measure of sample informativeness grounded in information theory widely applicable across model architectures and learning settings. LSI leverages a Bayesian approximation to the weight posterior and the KL divergence to measure the change in the parameter distribution induced by a sample of interest from the dataset. We experimentally show that LSI is effective in ordering the data with respect to typicality, detecting mislabeled samples, measuring class-wise informativeness, and assessing dataset difficulty. We demonstrate these capabilities of LSI on image and text data in supervised and unsupervised settings. Moreover, we show that LSI can be computed efficiently through probes and transfers well to the training of large models.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Georgios Kaissis

Dr.

* Former Member

[1725]

C. Kern, M. P. Kim and A. Zhou.
Multi-Accurate CATE is Robust to Unknown Covariate Shifts.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Estimating heterogeneous treatment effects is important to tailor treatments to those individuals who would most likely benefit. However, conditional average treatment effect predictors may often be trained on one population but possibly deployed on different, possibly unknown populations. We use methodology for learning multi-accurate predictors to post-process CATE T-learners (differenced regressions) to become robust to unknown covariate shifts at the time of deployment. The method works in general for pseudo-outcome regression, such as the DR-learner. We show how this approach can combine (large) confounded observational and (smaller) randomized datasets by learning a confounded predictor from the observational dataset, and auditing for multi-accuracy on the randomized controlled trial. We show improvements in bias and mean squared error in simulations with increasingly larger covariate shift, and on a semi-synthetic case study of a parallel large observational study and smaller randomized controlled experiment. Overall, we establish a connection between methods developed for multi-distribution learning and achieve appealing desiderata (e.g. external validity) in causal inference and machine learning.

MCML Authors

Christoph Kern

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Social Data Science and AI Lab

[1724]

C. Kolb, T. Weber, B. Bischl and D. Rügamer.
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging due to the non-differentiability of penalties like the L1 norm, which is incompatible with stochastic gradient descent. A promising alternative is shallow weight factorization, where weights are decomposed into two factors, allowing for smooth optimization of L1-penalized neural networks by adding differentiable L2 regularization to the factors. In this work, we introduce deep weight factorization, extending previous shallow approaches to more than two factors. We theoretically establish equivalence of our deep factorization with non-convex sparse regularization and analyze its impact on training dynamics and optimization. Due to the limitations posed by standard training practices, we propose a tailored initialization scheme and identify important learning rate requirements necessary for training factorized networks. We demonstrate the effectiveness of our deep weight factorization through experiments on various architectures and datasets, consistently outperforming its shallow counterpart and widely used pruning methods.

MCML Authors

Chris Kolb

Statistical Learning and Data Science

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Statistics, Data Science and Machine Learning

[1723]

M. Kollovieh, M. Lienen, D. Lüdke, L. Schwinn and S. Günnemann.
Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Recent advancements in generative modeling, particularly diffusion models, have opened new directions for time series modeling, achieving state-of-the-art performance in forecasting and synthesis. However, the reliance of diffusion-based models on a simple, fixed prior complicates the generative process since the data and prior distributions differ significantly. We introduce TSFlow, a conditional flow matching (CFM) model for time series combining Gaussian processes, optimal transport paths, and data-dependent prior distributions. By incorporating (conditional) Gaussian processes, TSFlow aligns the prior distribution more closely with the temporal structure of the data, enhancing both unconditional and conditional generation. Furthermore, we propose conditional prior sampling to enable probabilistic forecasting with an unconditionally trained model. In our experimental evaluation on eight real-world datasets, we demonstrate the generative capabilities of TSFlow, producing high-quality unconditional samples. Finally, we show that both conditionally and unconditionally trained models achieve competitive results across multiple forecasting benchmarks.

MCML Authors

Marcel Kollovieh

Data Analytics & Machine Learning

Stephan Günnemann

Prof. Dr.

A3 | Computational Models
→ Group Steffen Schneider

Data Analytics & Machine Learning

[1722]

R. G. Laiz, T. Schmidt and S. Schneider.
Self-supervised contrastive learning performs non-linear system identification.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Self-supervised learning (SSL) approaches have brought tremendous success across many tasks and domains. It has been argued that these successes can be attributed to a link between SSL and identifiable representation learning: Temporal structure and auxiliary variables ensure that latent representations are related to the true underlying generative factors of the data. Here, we deepen this connection and show that SSL can perform system identification in latent space. We propose DynCL, a framework to uncover linear, switching linear and non-linear dynamics under a non-linear observation model, give theoretical guarantees and validate them empirically.

MCML Authors

Tobias Schmidt

Dynamical Inference

Steffen Schneider

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Dynamical Inference

[1721]

Y. Li, D. Rügamer, B. Bischl and M. Rezaei.
Calibrating LLMs with Information-Theoretic Evidential Deep Learning.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Fine-tuned large language models (LLMs) often exhibit overconfidence, particularly when trained on small datasets, resulting in poor calibration and inaccurate uncertainty estimates. Evidential Deep Learning (EDL), an uncertainty-aware approach, enables uncertainty estimation in a single forward pass, making it a promising method for calibrating fine-tuned LLMs. However, despite its computational efficiency, EDL is prone to overfitting, as its training objective can result in overly concentrated probability distributions. To mitigate this, we propose regularizing EDL by incorporating an information bottleneck (IB). Our approach IB-EDL suppresses spurious information in the evidence generated by the model and encourages truly predictive information to influence both the predictions and uncertainty estimates. Extensive experiments across various fine-tuned LLMs and tasks demonstrate that IB-EDL outperforms both existing EDL and non-EDL approaches. By improving the trustworthiness of LLMs, IB-EDL facilitates their broader adoption in domains requiring high levels of confidence calibration.

MCML Authors

Yawei Li

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[1720]

H. Lim, J. Choi, J. Choo and S. Schneider.
Sparse autoencoders reveal selective remapping of visual concepts during adaptation.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g., shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.

MCML Authors

Steffen Schneider

Dr.

Dynamical Inference

[1719]

L. Lux, A. H. Berger, A. Weers, N. Stucki, D. Rückert, U. Bauer and J. C. Paetzold.
Topograph: An efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Topological correctness plays a critical role in many image segmentation tasks, yet most networks are trained using pixel-wise loss functions, such as Dice, neglecting topological accuracy. Existing topology-aware methods often lack robust topological guarantees, are limited to specific use cases, or impose high computational costs. In this work, we propose a novel, graph-based framework for topologically accurate image segmentation that is both computationally efficient and generally applicable. Our method constructs a component graph that fully encodes the topological information of both the prediction and ground truth, allowing us to efficiently identify topologically critical regions and aggregate a loss based on local neighborhood information. Furthermore, we introduce a strict topological metric capturing the homotopy equivalence between the union and intersection of prediction-label pairs. We formally prove the topological guarantees of our approach and empirically validate its effectiveness on binary and multi-class datasets. Our loss demonstrates state-of-the-art performance with up to fivefold faster loss computation compared to persistent homology methods.

MCML Authors

Laurin Lux

C1 | Medicine
→ Group Martin Menten

Artificial Intelligence in Healthcare and Medicine

Alexander Weers

Artificial Intelligence in Healthcare and Medicine

Nico Stucki

A2 | Mathematical Foundations
→ Group Ulrich Bauer

Applied Topology and Geometry

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry

[1718]

G. Manten, C. Casolo, E. Ferrucci, S. Mogensen, C. Salvi and N. Kilbertus.
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Inferring the causal structure underlying stochastic dynamical systems from observational data holds great promise in domains ranging from science and health to finance. Such processes can often be accurately modeled via stochastic differential equations (SDEs), which naturally imply causal relationships via ‘which variables enter the differential of which other variables’. In this paper, we develop conditional independence (CI) constraints on coordinate processes over selected intervals that are Markov with respect to the acyclic dependence graph (allowing self-loops) induced by a general SDE model. We then provide a sound and complete causal discovery algorithm, capable of handling both fully and partially observed data, and uniquely recovering the underlying or induced ancestral graph by exploiting time directionality assuming a CI oracle. Finally, to make our algorithm practically usable, we also propose a flexible, consistent signature kernel-based CI test to infer these constraints from data. We extensively benchmark the CI test in isolation and as part of our causal discovery algorithms, outperforming existing approaches in SDE models and beyond.

MCML Authors

Georg Manten

Ethics in Systems Design and Machine Learning

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[1717]

M. Muschalik, F. Fumagalli, P. Frazzetto, J. Strotherm, L. Hermes, A. Sperduti, E. Hüllermeier and B. Hammer.
Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Albeit the ubiquitous use of Graph Neural Networks (GNNs) in machine learning (ML) prediction tasks involving graph-structured data, their interpretability remains challenging. In explainable artificial intelligence (XAI), the Shapley Value (SV) is the predominant method to quantify contributions of individual features to a ML model’s output. Addressing the limitations of SVs in complex prediction models, Shapley Interactions (SIs) extend the SV to groups of features. In this work, we explain single graph predictions of GNNs with SIs that quantify node contributions and interactions among multiple nodes. By exploiting the GNN architecture, we show that the structure of interactions in node embeddings are preserved for graph prediction. As a result, the exponential complexity of SIs depends only on the receptive fields, i.e. the message-passing ranges determined by the connectivity of the graph and the number of convolutional layers. Based on our theoretical results, we introduce GraphSHAP-IQ, an efficient approach to compute any-order SIs exactly. GraphSHAP-IQ is applicable to popular message passing techniques in conjunction with a linear global pooling and output layer. We showcase that GraphSHAP-IQ substantially reduces the exponential complexity of computing exact SIs on multiple benchmark datasets. Beyond exact computation, we evaluate GraphSHAP-IQ’s approximation of SIs on popular GNN architectures and compare with existing baselines. Lastly, we visualize SIs of real-world water distribution networks and molecule structures using a SI-Graph.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1716]

L. Rauchwerger, S. Jegelka and R. Levie.
Generalization, Expressivity, and Universality of Graph Neural Networks on Attributed Graphs.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

We analyze the universality and generalization of graph neural networks (GNNs) on attributed graphs, i.e., with node attributes. To this end, we propose pseudometrics over the space of all attributed graphs that describe the fine-grained expressivity of GNNs. Namely, GNNs are both Lipschitz continuous with respect to our pseudometrics and can separate attributed graphs that are distant in the metric. Moreover, we prove that the space of all attributed graphs is relatively compact with respect to our metrics. Based on these properties, we prove a universal approximation theorem for GNNs and generalization bounds for GNNs on any data distribution of attributed graphs. The proposed metrics compute the similarity between the structures of attributed graphs via a hierarchical optimal transport between computation trees. Our work extends and unites previous approaches which either derived theory only for graphs with no attributes, derived compact metrics under which GNNs are continuous but without separation power, or derived metrics under which GNNs are continuous and separate points but the space of graphs is not relatively compact, which prevents universal approximation and generalization analysis.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1715]

L. Sang, Z. Canfes, D. Cao, F. Bernard and D. Cremers.
Implicit Neural Surface Deformation with Explicit Velocity Fields.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

In this work, we introduce the first unsupervised method that simultaneously predicts time-varying neural implicit surfaces and deformations between pairs of point clouds. We propose to model the point movement using an explicit velocity field and directly deform a time-varying implicit field using the modified level-set equation. This equation utilizes an iso-surface evolution with Eikonal constraints in a compact formulation, ensuring the integrity of the signed distance field. By applying a smooth, volume-preserving constraint to the velocity field, our method successfully recovers physically plausible intermediate shapes. Our method is able to handle both rigid and non-rigid deformations without any intermediate shape supervision. Our experimental results demonstrate that our method significantly outperforms existing works, delivering superior results in both quality and efficiency.

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Computer Vision & Artificial Intelligence

[1714]

P. Scholl, K. Bieker, H. Hauger and G. Kutyniok.
ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv GitHub

Abstract

The problem of symbolic regression (SR) arises in many different applications, such as identifying physical laws or deriving mathematical equations describing the behavior of financial markets from given data. Various methods exist to address the problem of SR, often based on genetic programming. However, these methods are usually complicated and involve various hyperparameters. In this paper, we present our new approach ParFam that utilizes parametric families of suitable symbolic functions to translate the discrete symbolic regression problem into a continuous one, resulting in a more straightforward setup compared to current state-of-the-art methods. In combination with a global optimizer, this approach results in a highly effective method to tackle the problem of SR. We theoretically analyze the expressivity of ParFam and demonstrate its performance with extensive numerical experiments based on the common SR benchmark suit SRBench, showing that we achieve state-of-the-art results. Moreover, we present an extension incorporating a pre-trained transformer network DL-ParFam to guide ParFam, accelerating the optimization process by up to two magnitudes.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Mathematical Foundations of Artificial Intelligence

[1713]

M. Schröder, V. Melnychuk and S. Feuerriegel.
Differentially private learners for heterogeneous treatment effects.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Patient data is widely used to estimate heterogeneous treatment effects and understand the effectiveness and safety of drugs. Yet, patient data includes highly sensitive information that must be kept private. In this work, we aim to estimate the conditional average treatment effect (CATE) from observational data under differential privacy. Specifically, we present DP-CATE, a novel framework for CATE estimation that is doubly robust and ensures differential privacy of the estimates. For this, we build upon non-trivial tools from semi-parametric and robust statistics to exploit the connection between privacy and model robustness. Our framework is highly general and applies to any two-stage CATE meta-learner with a Neyman-orthogonal loss function. It can be used with all machine learning models employed for nuisance estimation. We further provide an extension of DP-CATE where we employ RKHS regression to release the complete doubly robust CATE function while ensuring differential privacy. We demonstrate the effectiveness of DP-CATE across various experiments using synthetic and real-world datasets. To the best of our knowledge, we are the first to provide a framework for CATE estimation that is doubly robust and differentially private.

MCML Authors

Maresa Schröder

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1712]

Y. Shehata, B. Holzschuh and N. Thuerey.
Improved Sampling Of Diffusion Models In Fluid Dynamics With Tweedie's Formula.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

State-of-the-art Denoising Diffusion Probabilistic Models (DDPMs) rely on an expensive sampling process with a large Number of Function Evaluations (NFEs) to provide high-fidelity predictions. This computational bottleneck renders diffusion models less appealing as surrogates for the spatio-temporal prediction of physics-based problems with long rollout horizons. We propose Truncated Sampling Models, enabling single-step and few-step sampling with elevated fidelity by simple truncation of the diffusion process, reducing the gap between DDPMs and deterministic single-step approaches. We also introduce a novel approach, Iterative Refinement, to sample pre-trained DDPMs by reformulating the generative process as a refinement process with few sampling steps. Both proposed methods enable significant improvements in accuracy compared to DDPMs, DDIMs, and EDMs with NFEs ≤ 10 on a diverse set of experiments, including incompressible and compressible turbulent flow and airfoil flow uncertainty simulations. Our proposed methods provide stable predictions for long rollout horizons in time-dependent problems and are able to learn all modes of the data distribution in steady-state problems with high uncertainty.

MCML Authors

Nils Thuerey

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Physics-based Simulation

[1711]

E. Sommer, J. Robnik, G. Nozadze, U. Seljak and D. Rügamer.
Microcanonical Langevin Ensembles: Advancing the Sampling of Bayesian Neural Networks.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Despite recent advances, sampling-based inference for Bayesian Neural Networks (BNNs) remains a significant challenge in probabilistic deep learning. While sampling-based approaches do not require a variational distribution assumption, current state-of-the-art samplers still struggle to navigate the complex and highly multimodal posteriors of BNNs. As a consequence, sampling still requires considerably longer inference times than non-Bayesian methods even for small neural networks, despite recent advances in making software implementations more efficient. Besides the difficulty of finding high-probability regions, the time until samplers provide sufficient exploration of these areas remains unpredictable. To tackle these challenges, we introduce an ensembling approach that leverages strategies from optimization and a recently proposed sampler called Microcanonical Langevin Monte Carlo (MCLMC) for efficient, robust and predictable sampling performance. Compared to approaches based on the state-of-the-art No-U-Turn Sampler, our approach delivers substantial speedups up to an order of magnitude, while maintaining or improving predictive performance and uncertainty quantification across diverse tasks and data modalities. The suggested Microcanonical Langevin Ensembles and modifications to MCLMC additionally enhance the method’s predictability in resource requirements, facilitating easier parallelization. All in all, the proposed method offers a promising direction for practical, scalable inference for BNNs.

MCML Authors

Emanuel Sommer

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1710]

B. Tahmasebi and S. Jegelka.
Generalization Bounds for Canonicalization: A Comparative Study with Group Averaging.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Canonicalization, a popular method for generating invariant or equivariant function classes from arbitrary function sets, involves initial data projection onto a reduced input space subset, followed by applying any learning method to the projected dataset. Despite recent research on the expressive power and continuity of functions represented by canonicalization, its generalization capabilities remain less explored. This paper addresses this gap by theoretically examining the generalization benefits and sample complexity of canonicalization, comparing them with group averaging, another popular technique for creating invariant or equivariant function classes. Our findings reveal two distinct regimes where canonicalization may outperform or underperform compared to group averaging, with precise quantification of this phase transition in terms of sample size, group action characteristics, and a newly introduced concept of alignment. To the best of our knowledge, this study represents the first theoretical exploration of such behavior, offering insights into the relative effectiveness of canonicalization and group averaging under varying conditions.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1709]

T. Uscidda, L. Eyring, K. Roth, F. J. Theis, Z. Akata and M. Cuturi.
Disentangled Representation Learning with the Gromov-Monge Gap.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging geometrical considerations, e.g., by learning representations that preserve geometric features of the data, such as distances or angles between points. However, matching the prior while preserving geometric features is challenging, as a mapping that fully preserves these features while aligning the data distribution with the prior does not exist in general. To address these challenges, we introduce a novel approach to disentangled representation learning based on quadratic optimal transport. We formulate the problem using Gromov-Monge maps that transport one distribution onto another with minimal distortion of predefined geometric features, preserving them as much as can be achieved. To compute such maps, we propose the Gromov-Monge-Gap (GMG), a regularizer quantifying whether a map moves a reference distribution with minimal geometry distortion. We demonstrate the effectiveness of our approach for disentanglement across four standard benchmarks, outperforming other methods leveraging geometric considerations.

MCML Authors

Luca Eyring

Interpretable and Reliable Machine Learning

Karsten Roth

Interpretable and Reliable Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Zeynep Akata

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Interpretable and Reliable Machine Learning

[1708]

X. Wang, C. Hu, P. Röttger and B. Plank.
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Training a language model to be both helpful and harmless requires careful calibration of refusal behaviours: Models should refuse to follow malicious instructions or give harmful advice (e.g.‘how do I kill someone?’), but they should not refuse safe requests, even if they superficially resemble unsafe ones (e.g. ‘how do I kill a Python process?’). Avoiding such false refusal, as prior work has shown, is challenging even for highly-capable language models. In this paper, we propose a simple and surgical method for mitigating false refusal in language models via single vector ablation. For a given model, we extract a false refusal vector and show that ablating this vector reduces false refusal rate while preserving the model’s safety and general capabilities. We also show that our approach can be used for fine-grained calibration of model safety. Our approach is training-free and model-agnostic, making it useful for mitigating the problem of false refusal in current and future language models.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

AI and Computational Linguistics

[1707]

Y. Wang, M. Schröder, D. Frauen, J. Schweisthal, K. Heß and S. Feuerriegel.
Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Constructing confidence intervals (CIs) for the average treatment effect (ATE) from patient records is crucial to assess the effectiveness and safety of drugs. However, patient records typically come from different hospitals, thus raising the question of how multiple observational datasets can be effectively combined for this purpose. In our paper, we propose a new method that estimates the ATE from multiple observational datasets and provides valid CIs. Our method makes little assumptions about the observational datasets and is thus widely applicable in medical practice. The key idea of our method is that we leverage prediction-powered inferences and thereby essentially `shrink’ the CIs so that we offer more precise uncertainty quantification as compared to naïve approaches. We further prove the unbiasedness of our method and the validity of our CIs. We confirm our theoretical results through various numerical experiments. Finally, we provide an extension of our method for constructing CIs from combinations of experimental and observational datasets.

MCML Authors

Yuxin Wang

Artificial Intelligence in Management

Maresa Schröder

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1706]

M. Weber, L. Yu, Q. Yu, X. Deng, X. Shen, D. Cremers and L.-C. Chen.
MaskBit: Embedding-free Image Generation via Bit Tokens.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Masked transformer models for class-conditional image generation have become a compelling alternative to diffusion models. Typically comprising two stages - an initial VQGAN model for transitioning between latent space and image space, and a subsequent Transformer model for image generation within latent space - these frameworks offer promising avenues for image synthesis. In this study, we present two primary contributions: Firstly, an empirical and systematic examination of VQGANs, leading to a modernized VQGAN. Secondly, a novel embedding-free generation network operating directly on bit tokens - a binary quantized representation of tokens with rich semantics. The first contribution furnishes a transparent, reproducible, and high-performing VQGAN model, enhancing accessibility and matching the performance of current state-of-the-art methods while revealing previously undisclosed details. The second contribution demonstrates that embedding-free image generation using bit tokens achieves a new state-of-the-art FID of 1.52 on the ImageNet 256x256 benchmark, with a compact generator model of mere 305M parameters.

MCML Authors

Mark Weber

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1705]

Q. Zhang, Y. Wang, J. Cui, X. Pan, Q. Lei, S. Jegelka and Y. Wang.
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Deep learning models often suffer from a lack of interpretability due to polysemanticity, where individual neurons are activated by multiple unrelated semantics, resulting in unclear attributions of model behavior. Recent advances in monosemanticity, where neurons correspond to consistent and distinct semantics, have significantly improved interpretability but are commonly believed to compromise accuracy. In this work, we challenge the prevailing belief of the accuracy-interpretability tradeoff, showing that monosemantic features not only enhance interpretability but also bring concrete gains in model performance. Across multiple robust learning scenarios-including input and label noise, few-shot learning, and out-of-domain generalization-our results show that models leveraging monosemantic features significantly outperform those relying on polysemantic features. Furthermore, we provide empirical and theoretical understandings on the robustness gains of feature monosemanticity. Our preliminary analysis suggests that monosemanticity, by promoting better separation of feature representations, leads to more robust decision boundaries. This diverse evidence highlights the generality of monosemanticity in improving model robustness. As a first step in this new direction, we embark on exploring the learning benefits of monosemanticity beyond interpretability, supporting the long-standing hypothesis of linking interpretability and robustness.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1704]

C. Koke, D. Schnaus, Y. Shen, A. Saroha, M. Eisenberger, B. Rieck, M. M. Bronstein and D. Cremers.
On multi-scale Graph Representation Learning.
LMRL @ICLR 2025 - Workshop on Learning Meaningful Representations of Life at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

While Graph Neural Networks (GNNs) are widely used in modern computational biology, an underexplored drawback of common GNN methods,is that they are not inherently multiscale consistent: Two graphs describing the same object or situation at different resolution scales are assigned vastly different latent representations. This prevents graph networks from generating data representations that are consistent across scales. It also complicates the integration of representations at the molecular scale with those generated at the biological scale. Here we discuss why existing GNNs struggle with multiscale consistency and show how to overcome this problem by modifying the message passing paradigm within GNNs.

MCML Authors

Christian Koke

Computer Vision & Artificial Intelligence

Dominik Schnaus

Computer Vision & Artificial Intelligence

Yuesong Shen

* Former Member

Abhishek Saroha

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1703]

S. Dziadzio, V. Udandarao, K. Roth, A. Prabhu, Z. Akata, S. Albanie and M. Bethge.
How to Merge Multimodal Models Over Time?
MCDC @ICLR 2025 - Workshop on Modularity for Collaborative, Decentralized, and Continual Deep Learning at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Model merging combines multiple expert models finetuned from a base foundation model on diverse tasks and domains into a single, more capable model. However, most existing model merging approaches assume that all experts are available simultaneously. In reality, new tasks and domains emerge progressively over time, requiring strategies to integrate the knowledge of expert models as they become available: a process we call temporal model merging. The temporal dimension introduces unique challenges not addressed in prior work, raising new questions such as: when training for a new task, should the expert model start from the merged past experts or from the original base model? Should we merge all models at each time step? Which merging techniques are best suited for temporal merging? Should different strategies be used to initialize the training and deploy the model? To answer these questions, we propose a unified framework called TIME (Temporal Integration of Model Expertise) which defines temporal model merging across three axes: (1) initialization, (2) deployment, and (3) merging technique. Using TIME, we study temporal model merging across model sizes, compute budgets, and learning horizons on the FoMo-in-Flux benchmark. Our comprehensive suite of experiments across TIME allows us to build a better understanding of current challenges and best practices for effective temporal model merging.

MCML Authors

Karsten Roth

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1702]

C. Koke, Y. Shen, A. Saroha, M. Eisenberger, B. Rieck, M. M. B. Michael M. Bronstein and D. Cremers.
On Incorporating Scale into Graph Networks.
MLMP @ICLR 2025 - Workshop on Machine Learning Multiscale Processes at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. MLMP Best Paper Award. To be published. Preprint available. URL

Abstract

Standard graph neural networks assign vastly different latent embeddings to graphs describing the same physical system at different resolution scales. This precludes consistency in applications and prevents generalization between scales as would fundamentally be needed in many scientific applications. We uncover the underlying obstruction, investigate its origin and show how to overcome it.

MCML Authors

Christian Koke

Computer Vision & Artificial Intelligence

Yuesong Shen

* Former Member

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Vision & Artificial Intelligence

[1701]

A. Modarressi, A. Köksal, A. Imani, M. Fayyaz and H. Schütze.
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory.
NFAM @ICLR 2025 - Workshop on New Frontiers in Associative Memories at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Parametric memory pools and model editing are only partial solutions. Retrieval Augmented Generation (RAG) – though non-parametric – has its own limitations: it lacks structure, complicates interpretability and makes it hard to effectively manage stored knowledge. In this paper, we introduce MemLLM, a novel method of enhancing LLMs by integrating a structured and explicit read-and-write memory module. MemLLM tackles the aforementioned challenges by enabling dynamic interaction with the memory and improving the LLM’s capabilities in using stored knowledge. Our experiments indicate that MemLLM enhances the LLM’s performance and interpretability, in language modeling in general and knowledge-intensive tasks in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.

MCML Authors

Ali Modarressi

Computational Linguistics

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Ayyoob Imani

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[1700]

L. Wimmer, B. Bischl and L. Bothmann.
Trust Me, I Know the Way: Predictive Uncertainty in the Presence of Shortcut Learning.
SCSL @ICLR 2025 - Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

The correct way to quantify predictive uncertainty in neural networks remains a topic of active discussion. In particular, it is unclear whether the state-of-the art entropy decomposition leads to a meaningful representation of model, or epistemic, uncertainty (EU) in the light of a debate that pits ignorance against disagreement perspectives. We aim to reconcile the conflicting viewpoints by arguing that both are valid but arise from different learning situations. Notably, we show that the presence of shortcuts is decisive for EU manifesting as disagreement.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[1699]

C. Kolb, B. Bischl and D. Rügamer.
Differentiable Attention Sparsity via Structured D-Gating.
SLLM @ICLR 2025 - Workshop on Sparsity in LLMs at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

A core component of modern large language models is the attention mechanism, but its immense parameter count necessitates structured sparsity for resource-efficient optimization and inference. Traditional sparsity penalties, such as the group lasso, are non-smooth and thus incompatible with standard stochastic gradient descent methods. To address this, we propose a deep gating mechanism that reformulates the structured sparsity penalty into a fully differentiable optimization problem, allowing effective and principled norm-based group sparsification without requiring specialized non-smooth optimizers. Our theoretical analysis and empirical results demonstrate that this approach enables structured sparsity with simple stochastic gradient descent or variants while maintaining predictive performance.

MCML Authors

Chris Kolb

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1698]

A. Reuter, T. G. J. Rudner, V. Fortuin and D. Rügamer.
Can Transformers Learn Full Bayesian Inference in Context?
SynthData @ICLR 2025 - Workshop SynthData: Will Synthetic Data Finally Solve the Data Access Problem? at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1697]

L. Meynent, I. Melev, K. Schürholt, G. Kauermann and D. Borth.
Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction.
Weight Space Learning @ICLR 2025 - Workshop on Weight Space Learning at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

The weights of neural networks (NNs) have recently gained prominence as a new data modality in machine learning, with applications ranging from accuracy and hyperparameter prediction to representation learning or weight generation. One approach to leverage NN weights involves training autoencoders (AEs), using contrastive and reconstruction losses. This allows such models to be applied to a wide variety of downstream tasks, and they demonstrate strong predictive performance and low reconstruction error. However, despite the low reconstruction error, these AEs reconstruct NN models with deteriorated performance compared to the original ones, limiting their usability with regard to model weight generation. In this paper, we identify a limitation of weight-space AEs, specifically highlighting that a structural loss, that uses the Euclidean distance between original and reconstructed weights, fails to capture some features critical for reconstructing high-performing models. We analyze the addition of a behavioral loss for training AEs in weight space, where we compare the output of the reconstructed model with that of the original one, given some common input. We show a strong synergy between structural and behavioral signals, leading to increased performance in all downstream tasks evaluated, in particular NN weights reconstruction and generation.

MCML Authors

Göran Kauermann

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Applied Statistics in Social Sciences, Economics and Business

[1696]

Z. Li, S. Yan, Y. Ma, Y. Li, X. Lyu and M. Schubert.
Beyond Single-Step: Multi-Frame Action-Conditiones Video Generation for Reinforcement Learning Environments.
World Models @ICLR 2025 - Workshop on World Models: Understanding, Modelling and Scaling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

World models achieved great success in learning the dynamics from both low-dimensional and high-dimensional states. Yet, there is no existing work to address multi-step generation task with high dimensional data. In this paper, we propose the first action-conditioned multi-frame video generation model, advancing world
model development by generating future states from actions. As opposed to recent single-step or autoregressive approaches, our model directly generates multiple future frames conditioned on past observations and action sequences. Our framework extends its capabilities to action-conditioned video generation by introducing an action encoder. This addition enables the spatiotemporal variational autoencoder and diffusion transformer in Open-Sora to effectively incorporate action information, ensuring precise and coherent video generation. We evaluated performance on Atari environments (Breakout, Pong, DemonAttack) using MSE, PSNR, and LPIPS. Results show that conditioning solely on future actions and embedding-based encoding improve generation accuracy and perceptual quality while capturing complex temporal dependencies like inertia. Our work paves the way for action-conditioned multi-step generative world models in dynamic environment.

MCML Authors

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[1695]

T. Decker, V. Tresp and F. Buettner.
Why Uncertainty Calibration Matters for Reliable Perturbation-based Explanations.
XAI4Science @ICLR 2025 - Workshop XAI4Science: From Understanding Model Behavior to Discovering New Scientific Knowledge at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Perturbation-based explanations are widely utilized to enhance the transparency of modern machine-learning models. However, their reliability is often compromised by the unknown model behavior under the specific perturbations used. This paper investigates the relationship between uncertainty calibration - the alignment of model confidence with actual accuracy - and perturbation-based explanations. We show that models frequently produce unreliable probability estimates when subjected to explainability-specific perturbations and theoretically prove that this directly undermines explanation quality. To address this, we introduce ReCalX, a novel approach to recalibrate models for improved perturbation-based explanations while preserving their original predictions. Experiments on popular computer vision models demonstrate that our calibration strategy produces explanations that are more aligned with human perception and actual object locations.

MCML Authors

Thomas Decker

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1694]

R. Visser, F. Fumagalli, M. Muschalik, E. Hüllermeier and B. Hammer.
Explaining Outliers using Isolation Forest and Shapley Interactions.
ESANN 2025 - European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges, Belgium, Apr 23-25, 2025. PDF

Abstract

In unsupervised machine learning, Isolation Forest (IsoForest) is a widely used algorithm for the eﬃcient detection of outliers. Identifying the features responsible for observed anomalies is crucial for practitioners, yet the ensemble nature of IsoForest complicates interpretation and comparison. As a remedy, SHAP is a prevalent method to interpret outlier scoring models by assigning contributions to individual features based on the Shapley Value (SV). However, complex anomalies typically involve interaction of features, and it is paramount for practitioners to distinguish such complex anomalies from simple cases. In this work, we propose Shapley Interactions (SIs) to enrich explanations of outliers with feature interactions. SIs, as an extension of the SV, decompose the outlier score into contributions of individual features and interactions of features up to a specified explanation order. We modify IsoForest to compute SI using TreeSHAP-IQ, an extension of TreeSHAP for tree-based models, using the shapiqpackage. Using a qualitative and quantitative analysis on synthetic and real-world datasets, we demonstrate the benefit of SI and feature interactions for outlier explanations over feature contributions alone.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[1693]

H. A. Gündüz.
Designing and optimizing deep learning methods for genomic sequencing data.
Dissertation 2025. DOI

Abstract

This dissertation applies modern deep learning techniques to genomics, introducing new approaches for self-supervised learning, uncertainty quantification, and automated model design. A key focus is the effective use of unlabeled genomic data, highlighted by the development of Self-GenomeNet, a self-supervised method tailored to genomic sequences. The work also presents automated optimization strategies for model architectures and hyperparameters, achieving better results than expert-designed models. Finally, it contributes user-friendly software that supports various genomic data formats and integrates core methods developed in the thesis. (Shortened).

MCML Authors

Hüseyin Anil Gündüz

* Former Member

[1692]

J. R. Jostan, L. M. Rodriguez, D. Z. Bernal, J. O. Berdugo, V. Aljure, F. Lopez, J. R. Lopez, N. Navab, D. Mateus and V. G. Duque.
Ultrasound Nerve Segmentation with Deep Learning for Leprosy.
ISBI 2025 - IEEE 22nd International Symposium on Biomedical Imaging. Houston, TX, USA, Apr 14-17, 2025. DOI

Abstract

Purpose: This study aims to provide an AI tool for detecting nerves in ultrasound images to help diagnose Hansen’s disease (Leprosy) in rural areas. The significant difference in the cross-sectional area (CSA) of superficial nerves in symmetrical extremities is a landmark in the early stages of the disease. Despite its potential, ultrasound nerve evaluation is limited due to the difficulty in accurately identifying nerves in ultrasound images.
Methodology: We propose the first Leprosy video nerve segmentation pipeline based on YOLOv8 and X-Mem architectures to automate frame detection, segmentation, and label propagation. We ensure alignment with clinical practices and evaluate the inference in real time of the method and its energy efficiency, confirming the approach’s feasibility in resource-limited settings.
Results: We establish a baseline for nerve segmentation of ultrasound Leprosy videos, presenting the first results to identify relevant frames, segment, and propagate labels. To support further research, we have open source a new leprosy test dataset and created a demo web page to try our method on real patient data. This initiative aims to promote research on AI techniques to improve healthcare in rural communities, where healthcare professionals are scarce and assistance is essential.

MCML Authors

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1691]

Z. Ding.
Inductive representation learning and natural language question answering on temporal knowledge graphs.
Dissertation 2025. DOI

Abstract

Real-world applications such as recommendersystems, socialnetworks, andprotein-protein interactions often involve relational data. In recent years, there has been increasing interest in machine learning on such data, particularly in the context of knowledge graphs (KGs). KGs are structured relational data that store multi-relational information as directed graphs, where each node corresponds to an entity and each labeled edge represents a factual relationship between entities, e.g., (Oxford, located in, the United Kingdom). Traditional KGs assume time-invariant relationships. However, real-world relationships are dynamically evolving over time. For example, the chancellor of Germany in 2020 was Angela Merkel, but in 2022 it became Olaf Scholz. This necessitates the use of temporal knowledge graphs (TKGs), where temporal facts are introduced by coupling stationary facts with additional time identifiers, e.g., (Angela Merkel, is chancellor of, Germany, 2020). TKGs are more expressive than KGs as they model the temporal evolution of knowledge. Consequently, recent research has paid more attention to machine learning on TKGs. In this thesis, we focus on two machine learning problems: inductive knowledge representation learning and natural language question answering (QA) on TKGs. (Shortened)

MCML Authors

Zifeng Ding

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Database Systems and Data Mining

[1690]

K. Schwethelm, J. Kaiser, J. Kuntzer, M. Yigitsoy, D. Rueckert and G. Kaissis.
Differentially Private Active Learning: Balancing Effective Data Selection and Privacy.
SaTML 2025 - IEEE Conference on Secure and Trustworthy Machine Learning. Copenhagen, Denmark, Apr 09-11, 2025. DOI

Abstract

Active learning (AL) is a widely used technique for optimizing data labeling in machine learning by iteratively selecting, labeling, and training on the most informative data. However, its integration with formal privacy-preserving methods, particularly differential privacy (DP), remains largely underexplored. While some works have explored differentially private AL for specialized scenarios like online learning, the fundamental challenge of combining AL with DP in standard learning settings has remained unaddressed, severely limiting AL’s applicability in privacy-sensitive domains. This work addresses this gap by introducing differentially private active learning (DP-AL) for standard learning settings. We demonstrate that naively integrating DP-SGD training into AL presents substantial challenges in privacy budget allocation and data utilization. To overcome these challenges, we propose step amplification, which leverages individual sampling probabilities in batch creation to maximize data point participation in training steps, thus optimizing data utilization. Additionally, we investigate the effectiveness of various acquisition functions for data selection under privacy constraints, revealing that many commonly used functions become impractical. Our experiments on vision and natural language processing tasks show that DP-AL can improve performance for specific datasets and model architectures. However, our findings also highlight the limitations of AL in privacy-constrained environments, emphasizing the trade-offs between privacy, model accuracy, and data selection accuracy.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

[1689]

S. Bai, A. Kruspe and X. Zhu.
Generating Synthetic Oracle Datasets to Analyze Noise Impact: A Study on Building Function Classification Using Tweets.
ECIR 2025 - 47th European Conference on Information Retrieval. Lucca, Italy, Apr 06-10, 2025. To be published.

Abstract

Geo-tagged tweets collected at the building level has patterns that aid in building function classification. However, this data source suffers from substantial noise, limiting its effectiveness. Conducting a systematic noise analysis requires a noise-free environment, which is difficult to obtain from real-world data. In this study, we propose an approach using an LLM-generated synthetic oracle dataset that contains only correctly assigned tweets aligned with their respective buildings. To make the dataset reflects real-world distributions, we use a data generation pipeline that integrates data attributes from real world into LLM prompts. To evaluate the utility of the synthetic dataset for noise analysis, we compare the performance of Naïve Bayes (NB) and mBERT classifiers on it against real-world noisy data. Additionally, we assess the dataset’s diversity by comparing Self-BLEU and perplexity scores against those of real-world datasets. Our findings reveal that while noise significantly disrupts mBERT’s contextual learning, its removal in the synthetic dataset enables mBERT to substantially outperform NB. This highlights that noise reduction is more effective than increasing model complexity for context-dependent text classification tasks. Moreover, despite reduced noise and sentence structure variations, the synthetic dataset preserves realistic linguistic characteristics. These results confirm that a synthetic oracle dataset provides an effective noise-free experimental environment for studying noise impact in text classification.

MCML Authors

Shanshan Bai

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Data Science in Earth Observation

[1688]

M. Mironov, A. Marquard, D. Racek, C. Heumann, P. W. Thurner and M. Aßenmacher.
A Geoparsing Pipeline for Multilingual Social Media Posts from Ukraine.
GeoExT @ECIR 2025 - 3rd International Workshop on Geographic Information Extraction from Texts at the 47th European Conference on Information Retrieval (ECIR 2025). Lucca, Italy, Apr 06-10, 2025. PDF

Abstract

The dynamics of contemporary social media communication, particularly on platforms like X (formerly Twitter), have significantly evolved, and this data is frequently used for scientific research. However, due to X’s API changes in 2019, a tweet’s precise geolocation is no longer present in the data, thus preventing a geographical assessment of tweets. This project aims to extract location mentions from tweets’ texts and to map them to Ukraine’s administrative regions. We have developed a specialized pipeline for geoparsing with specific prebuilt components for the Ukrainian, Russian, and English languages. The main advantage of our pipeline’s architecture is the interchangeability of all components, allowing for the integration of custom-developed solutions. Initial tests on our hand-labeled Ukrainian dataset show promising results in accurately identifying and mapping location mentions despite various challenges, such as declension and the presence of multiple languages in a single tweet. Additional experiments using publicly available benchmark data further indicate promising performance when transferring our pipeline to other geographical regions. Both our geoparsing pipeline and its online documentation have been made publicly available.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[1687]

H. Hauger, P. Scholl and G. Kutyniok.
Robust identifiability for symbolic recovery of differential equations.
ICASSP 2025 - IEEE International Conference on Acoustics, Speech and Signal Processing. Hyderabad, India, Apr 06-11, 2025. DOI

Abstract

Recent advancements in machine learning have transformed the discovery of physical laws, moving from manual derivation to data-driven methods that simultaneously learn both the structure and parameters of governing equations. This shift introduces new challenges regarding the validity of the discovered equations, particularly concerning their uniqueness and, hence, identifiability. While the issue of non-uniqueness has been well-studied in the context of parameter estimation, it remains underexplored for algorithms that recover both structure and parameters simultaneously. Early studies have primarily focused on idealized scenarios with perfect, noise-free data. In contrast, this paper investigates how noise influences the uniqueness and identifiability of physical laws governed by partial differential equations (PDEs). We develop a comprehensive mathematical framework to analyze the uniqueness of PDEs in the presence of noise and introduce new algorithms that account for noise, providing thresholds to assess uniqueness and identifying situations where excessive noise hinders reliable conclusions. Numerical experiments demonstrate the effectiveness of these algorithms in detecting uniqueness despite the presence of noise.

MCML Authors

Philipp Scholl

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1686]

X. Jing, K. Zhou, A. Triantafyllopoulos and B. W. Schuller.
Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models.
ICASSP 2025 - IEEE International Conference on Acoustics, Speech and Signal Processing. Hyderabad, India, Apr 06-11, 2025. DOI

Abstract

While current emotional text-to-speech (TTS) systems can generate highly intelligible emotional speech, achieving fine control over emotion rendering of the output speech still remains a significant challenge. In this paper, we introduce ParaEVITS, a novel emotional TTS framework that leverages the compositionality of natural language to enhance control over emotional rendering. By incorporating a text-audio encoder inspired by ParaCLAP, a contrastive language-audio pretraining (CLAP) model for computational paralinguistics, the diffusion model is trained to generate emotional embeddings based on textual emotional style descriptions. Our framework first trains on reference audio using the audio encoder, then fine-tunes a diffusion model to process textual inputs from ParaCLAP’s text encoder. During inference, speech attributes such as pitch, jitter, and loudness are manipulated using only textual conditioning. Our experiments demonstrate that ParaEVITS effectively control emotion rendering without compromising speech quality. Speech demos are publicly available.

MCML Authors

Xin Jing

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1685]

J. Kostin, F. Krahmer and D. Stöger.
How robust is randomized blind deconvolution via nuclear norm minimization against adversarial noise?
Applied and Computational Harmonic Analysis 76.101746 (Apr. 2025). DOI

Abstract

In this paper, we study the problem of recovering two unknown signals from their convolution, which is commonly referred to as blind deconvolution. Reformulation of blind deconvolution as a low-rank recovery problem has led to multiple theoretical recovery guarantees in the past decade due to the success of the nuclear norm minimization heuristic. In particular, in the absence of noise, exact recovery has been established for sufficiently incoherent signals contained in lower-dimensional subspaces. However, if the convolution is corrupted by additive bounded noise, the stability of the recovery problem remains much less understood. In particular, existing reconstruction bounds involve large dimension factors and therefore fail to explain the empirical evidence for dimension-independent robustness of nuclear norm minimization. Recently, theoretical evidence has emerged for ill-posed behavior of low-rank matrix recovery for sufficiently small noise levels. In this work, we develop improved recovery guarantees for blind deconvolution with adversarial noise which exhibit square-root scaling in the noise level. Hence, our results are consistent with existing counterexamples which speak against linear scaling in the noise level as demonstrated for related low-rank matrix recovery problems.

MCML Authors

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[1684]

B. Lange.
Beyond the Ivory Tower? The Practical Role of Ethicists in Business.
Artificial Intelligence, Entrepreneurship and Risk. Technikzukünfte, Wissenschaft und Gesellschaft / Futures of Technology, Science and Society (Apr. 2025). DOI

Abstract

‘AI Ethics’, ‘Digital Ethics’ or ‘Corporate Digital Responsibility’—ethics in business, especially with the rise of Artificial Intelligence (AI), is now in vogue. But how, if at all, can ethicists meaningfully contribute to practical business challenges? I examine the value that resources from moral philosophy can bring to ethical issues in business, particularly the technology sector. I show that there is a specific need for sharpened ethical acumen in so-called ‘grey areas’, in which laws and regulation do not provide definite answers to the ethical challenges businesses face. I argue that ethicists can distinctively help businesses navigate grey areas by strengthening their ethical capabilities and functions, which concern an organization’s ethical awareness, deliberation, decision-making, and commitment. I conclude by discussing some practical examples of how ethicists can strengthen these capabilities.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[1683]

Q. Li, H. Taubenböck and X. Zhu.
Identification of the potential for roof greening using remote sensing and deep learning.
Cities 159.105782 (Apr. 2025). DOI

Abstract

Under the mounting pressure from global warming, green roofs emerge as a valuable source for climate adaptation, particularly in compact metropolises where green space is limited. Consequently, there is a need to quantitatively evaluate the potential for roof greening where it is most needed and suitable. Despite the increasing importance of this issue, there have been limited studies on the effectiveness of remote sensing and deep learning in identifying the potential for roof greening in many cities. To address this, we have created a GreenRoof dataset, comprising approximately 6400 pairs of remote sensing images and corresponding masks of roofs with high greening potential in four European cities. Afterward, we exploit the capabilities of deep learning methods to identify roofs that are suitable for greening from remote sensing images. Using 15 German cities as a case study for future urban rooftop planning, we estimate the spatial potential for retrofitting green roofs. Structural parameters for prioritizing green roof implementation include vegetation coverage, thermal environment, and building density. Results indicate that the total area suitable for green roof retrofitting exceeds 20% of the roof area in the 15 German cities examined. The spatial analysis effectively reflects variation in demand and suitability for green roof retrofitting across different cities. In conclusion, this study provides a versatile screening approach utilizing remote sensing, deep learning, and spatial analysis, which can be readily adapted to inform municipal policies in other cities aiming to promote green roofs and enhance sustainable urban development.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Data Science in Earth Observation

[1682]

T. Weber.
Advancing Deep Learning in medical imaging through generative modeling and representation learning.
Dissertation 2025. DOI

Abstract

In recent years, deep learning (DL) has proven to be a disruptive enabler in many domains,
including the realm of medical imaging. The application of neural networks and other learn-
able algorithms has substantially impacted the medical field, promising to improve diagnostic
accuracy, enhance patient outcomes, and streamline clinical workflows. The advent of large-
scale datasets and advancements in computational power have facilitated the development of
sophisticated DL models capable of analyzing and interpreting complex medical images. The
scope of this thesis concentrates on a subset of the full DL spectrum, specifically the uprising
areas of generative modeling and representation learning, which are closely interleaved with
each other. The proposed contributions aim to push the boundaries of established medical
image DL methods, venturing into more experimental research areas. (Shortened)

MCML Authors

Tobias Weber

* Former Member

[1681]

C. Cipriani, M. Fornasier and A. Scagliotti.
From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks.
European Journal of Applied Mathematics 36.Special Issue 2: From integro-differential models to data-oriented approaches for emergent phenomena (Apr. 2025). DOI

Abstract

The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks, which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modelling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularisation, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularisation may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.

MCML Authors

Cristina Cipriani

A2 | Mathematical Foundations
→ Group Massimo Fornasier

* Former Member

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Alessandro Scagliotti

Applied Numerical Analysis

[1680]

M. Fornasier, P. Richtárik, K. Riedl and L. Sun.
Consensus-Based Optimization with Truncated Noise.
European Journal of Applied Mathematics 36.Special Issue 2: From integro-differential models to data-oriented approaches for emergent phenomena (Apr. 2025).

Abstract

Consensus-based optimisation (CBO) is a versatile multi-particle metaheuristic optimisation method suitable for performing non-convex and non-smooth global optimisations in high dimensions. It has proven effective in various applications while at the same time being amenable to a theoretical convergence analysis. In this paper, we explore a variant of CBO, which incorporates truncated noise in order to enhance the well-behavedness of the statistics of the law of the dynamics. By introducing this additional truncation in the noise term of the CBO dynamics, we achieve that, in contrast to the original version, higher moments of the law of the particle system can be effectively bounded. As a result, our proposed variant exhibits enhanced convergence performance, allowing in particular for wider flexibility in choosing the noise parameter of the method as we confirm experimentally. By analysing the time evolution of the Wasserstein- 2 distance between the empirical measure of the interacting particle system and the global minimiser of the objective function, we rigorously prove convergence in expectation of the proposed CBO variant requiring only minimal assumptions on the objective function and on the initialisation. Numerical evidences demonstrate the benefit of truncating the noise in CBO.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Konstantin Riedl

Dr.

* Former Member

Lukang Sun

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[1679]

A. Maarouf, S. Feuerriegel and N. Pröllochs.
A fused large language model for predicting startup success.
European Journal of Operational Research 322.1 (Apr. 2025). DOI

Abstract

Investors are continuously seeking profitable investment opportunities in startups and, hence, for effective decision-making, need to predict a startup’s probability of success. Nowadays, investors can use not only various fundamental information about a startup (e.g., the age of the startup, the number of founders, and the business sector) but also textual description of a startup’s innovation and business model, which is widely available through online venture capital (VC) platforms such as Crunchbase. To support the decision-making of investors, we develop a machine learning approach with the aim of locating successful startups on VC platforms. Specifically, we develop, train, and evaluate a tailored, fused large language model to predict startup success. Thereby, we assess to what extent self-descriptions on VC platforms are predictive of startup success. Using 20,172 online profiles from Crunchbase, we find that our fused large language model can predict startup success, with textual self-descriptions being responsible for a significant part of the predictive power. Our work provides a decision support tool for investors to find profitable investment opportunities.

MCML Authors

Abdurahman Maarouf

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1678]

L. Nas, B. F. Hoppe, A. T. Stüber, S. Grosu, N. Fink, A. von Fragstein, J. Rudolph, J. Ricke and B. O. Sabel.
Optimizing lower extremity CT angiography: A prospective study of individualized vs. fixed post-trigger delays in bolus tracking.
European Journal of Radiology 185.112009 (Apr. 2025). DOI

Abstract

Purpose: To compare the contrast media opacification and diagnostic quality in lower-extremity runoff CT angiography (CTA) between bolus-tracking using conventional fixed trigger delay and patient-specific individualized post-trigger delay.
Methods: In this prospective study, lower-extremity runoff CTA was performed in two cohorts, using either fixed or individualized trigger delay. Both cohorts had identical CT protocols, contrast media applications, and image reconstructions. Objective image quality (mean contrast opacification in HU), and subjective image quality (5-point Likert-scale), were assessed in six vessels: abdominal aorta (AA), common iliac artery (CIA), superficial femoral artery (SFA), popliteal artery (PA), posterior tibial artery (PTA), and dorsalis pedis artery (DPA) by one rater for objective and two raters for subjective image quality. Objective image quality was analyzed using Student t-tests, while subjective ratings were compared with Fisher’s exact test.
Results: Overall, 65 patients were included (mean age: 71 ± 14; 39 men), 35 in the individualized cohort and 30 in the fixed cohort. No differences were found between the groups regarding demographics or radiation exposure. Individualized trigger delay ranged from 2 to 23 s (mean: 8.7 ± 4.0 s) and was 10 s in the fixed cohort. The individualized cohort showed higher opacification in the peripheral arteries (PTA: 479 ± 140 HU vs. 379 ± 106 HU; p = 0.009; DPA: 477 ± 191 HU vs. 346 ± 137 HU; p = 0.009). Overall subjective “image quality” was rated higher in the individualized group (“excellent” or “good” in Rater 1: 97% vs. 57%; p < 0.001; and Rater 2: 89% vs. 53%; p = 0.002).
Conclusion: Individualized post-trigger delay enhances diagnostic quality, by improving vessel opacification in peripheral arteries and increasing subjective image quality in lower extremity runoff CTA.

MCML Authors

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Theresa Stüber

Clinical Data Science in Radiology

[1677]

G. Kutyniok.
How Can Reliability of Artificial Intelligence Be Ensured?
Harvard Data Science Review 7.2 (Apr. 2025). DOI

Abstract

Column Editor’s Note: Artificial intelligence (AI) is having a profound impact across many areas of science and society. However, there remain important gaps in our understanding of the deep neural networks that underpin these developments, and in many cases AI models lack robustness and reliability. In this Diving into Data column, Professor Kutyniok explores these issues from a mathematical perspective, highlighting open theoretical questions that will need to be resolved in order to develop AIs that are truly reliable, generalizable, and trustworthy.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1676]

W. Qi, X. Xu, K. Qian, B. W. Schuller, G. Fortino and A. Aliverti.
A Review of AIoT-Based Human Activity Recognition: From Application to Technique.
IEEE Journal of Biomedical and Health Informatics 29.4 (Apr. 2025). DOI

Abstract

This scoping review paper redefines the Artificial Intelligence-based Internet of Things (AIoT) driven Human Activity Recognition (HAR) field by systematically extrapolating from various application domains to deduce potential techniques and algorithms. We distill a general model with adaptive learning and optimization mechanisms by conducting a detailed analysis of human activity types and utilizing contact or non-contact devices. It presents various system integration mathematical paradigms driven by multimodal data fusion, covering predictions of complex behaviors and redefining valuable methods, devices, and systems for HAR. Additionally, this paper establishes benchmarks for behavior recognition across different application requirements, from simple localized actions to group activities. It summarizes open research directions, including data diversity and volume, computational limitations, interoperability, real-time recognition, data security, and privacy concerns. Finally, we aim to serve as a comprehensive and foundational resource for researchers delving into the complex and burgeoning realm of AIoT-enhanced HAR, providing insights and guidance for future innovations and developments.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1675]

X. Qiu, W. Qiu, Y. Zhang, K. Qian, C. Li, B. Hu, B. W. Schuller and Y. Yamamoto.
FedKDC: Consensus-Driven Knowledge Distillation for Personalized Federated Learning in EEG-Based Emotion Recognition.
IEEE Journal of Biomedical and Health Informatics Early Access (Apr. 2025). DOI GitHub

Abstract

Federated learning (FL) has gained prominence in electroencephalogram (EEG)-based emotion recognition because of its ability to enable secure collaborative training without centralized data. However, traditional FL faces challenges due to model and data heterogeneity in smart healthcare settings. For example, medical institutions have varying computational resources, which creates a need for personalized local models. Moreover, EEG data from medical institutions typically face data heterogeneity issues stemming from limitations in participant availability, ethical constraints, and cultural differences among subjects, which can slow model convergence and degrade model performance. To address these challenges, we propose FedKDC, a novel FL framework that incorporates clustered knowledge distillation (CKD). This method introduces a consensus-based distributed learning mechanism to facilitate the clustering process. It then enhances the convergence speed through intraclass distillation and reduces the negative impact of heterogeneity through interclass distillation. Additionally, we introduce a DriftGuard mechanism to mitigate client drift, along with an entropy reducer to decrease the entropy of aggregated knowledge. The framework is validated on the SEED, SEED-IV, SEED-FRA, and SEED-GER datasets, demonstrating its effectiveness in scenarios where both the data and the models are heterogeneous. Experimental results show that FedKDC outperforms other FL frameworks in emotion recognition, achieving a maximum average accuracy of 85.2%, and in convergence efficiency, with faster and more stable convergence.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1674]

Y. Bi, Y. Su, N. Navab and Z. Jiang.
Gaze-Guided Robotic Vascular Ultrasound Leveraging Human Intention Estimation.
IEEE Robotics and Automation Letters 10.4 (Apr. 2025). DOI

Abstract

Medical ultrasound has been widely used to examine vascular structure in modern clinical practice. However, traditional ultrasound examination often faces challenges related to inter- and intra-operator variation. The robotic ultrasound system (RUSS) appears as a potential solution for such challenges because of its superiority in stability and reproducibility. Given the complex anatomy of human vasculature, multiple vessels often appear in ultrasound images, or a single vessel bifurcates into branches, complicating the examination process. To tackle this challenge, this work presents a gaze-guided RUSS for vascular applications. A gaze tracker captures the eye movements of the operator. The extracted gaze signal guides the RUSS to follow the correct vessel when it bifurcates. Additionally, a gaze-guided segmentation network is proposed to enhance segmentation robustness by exploiting gaze information. However, gaze signals are often noisy, requiring interpretation to accurately discern the operator’s true intentions. To this end, this study proposes a stabilization module to process raw gaze data. The inferred attention heatmap is utilized as a region proposal to aid segmentation and serve as a trigger signal when the operator needs to adjust the scanning target, such as when a bifurcation appears. To ensure appropriate contact between the probe and surface during scanning, an automatic ultrasound confidence-based orientation correction method is developed. In experiments, we demonstrated the efficiency of the proposed gaze-guided segmentation pipeline by comparing it with other methods. Besides, the performance of the proposed gaze-guided RUSS was also validated as a whole on a realistic arm phantom with an uneven surface.

MCML Authors

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1673]

J. Huang, P. K. Yu, N. Navab and B. Busam.
TTAPose: Test-time Adaptation for Unseen Object Pose Estimation.
IEEE Robotics and Automation Letters 10.6 (Apr. 2025). DOI

Abstract

Recent advances in the field of 6D pose estimation of unseen objects not present during training are promising, however, the performance gap between these general methods and object-specific methods remains significant. This paper introduces an innovative unsupervised test-time adaptation method, termed TTAPose, capable of adapting a pose estimator to any unseen object. TTAPose initially undergoes pre-training using a large synthetic dataset and thereafter refines the weights using unsupervised loss conducted on unseen real-world target objects. The network, based on a teacher-student architecture, leverages an RGB-D pose refinement pipeline to incrementally improve pseudo labels. Notably, TTAPose operates with no requirement for target data annotation, thus minimizing time and data expenditure. Experimental results show performance levels comparable to supervised methods, effectively narrowing the gap to object-specific baselines.

MCML Authors

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1672]

L. Christ, S. Amiriparian, A. Kathan, N. Müller, A. König and B. W. Schuller.
Towards Multimodal Prediction of Spontaneous Humor: A Novel Dataset and First Results.
IEEE Transactions on Affective Computing 16.2 (Apr. 2025). DOI

Abstract

Humor is a substantial element of human social behavior, affect, and cognition. Its automatic understanding can facilitate a more naturalistic human-AI interaction. Current methods of humor detection have been exclusively based on staged data, making them inadequate for ‘real-world’ applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset, comprising about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humor and its dimensions (sentiment and direction) as proposed in Martin’s Humor Style Questionnaire. We conduct a series of experiments employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humor recognition is analyzed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humor and its sentiment, facial expressions are most promising, while humor direction can be best modeled via text-based features. Further, we experiment with different multimodal approaches to humor recognition, including decision-level fusion and MulT, a multimodal Transformer approach. In this context, we propose a novel multimodal architecture that yields the best overall results.

MCML Authors

Shahin Amiriparian

Dr.

Health Informatics

Alexander Kathan

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1671]

L. Shen, H. Zhang, C. Zhu, R. Li, K. Qian, F. Tian, B. Hu, B. W. Schuller and Y. Yamamoto.
Enhancing Emotion Regulation in Mental Disorder Treatment: An AIGC-based Closed-Loop Music Intervention System.
IEEE Transactions on Affective Computing Early Access (Apr. 2025). DOI

Abstract

Mental disorders have increased rapidly and have emerged as a serious social health issue in the recent decade. Undoubtedly, the timely treatment of mental disorders is crucial. Emotion regulation has been proven to be an effective method for treating mental disorders. Music therapy as one of the methods that can achieve emotional regulation has gained increasing attention in the field of mental disorder treatment. However, traditional music therapy methods still face some unresolved issues, such as the lack of real-time capability and the inability to form closed-loop systems. With the advancement of artificial intelligence (AI), especially AI-generated content (AIGC), AI-based music therapy holds promise in addressing these issues. In this paper, an AIGC-based closed-loop music intervention system demonstration is proposed to regulate emotions for mental disorder treatment. This system demonstration consists of an emotion recognition model and a music generation model. The emotion recognition model can assess mental states, while the music generation model generates the corresponding emotional music for regulation. The system continuously performs recognition and regulation, thus forming a closed-loop process. In the experiment, we first conduct experiments on both the emotion recognition model and the music generation model to validate the accuracy of the recognition model and the music quality generated by the music generation models. In conclusion, we conducted comprehensive tests on the entire system to verify its feasibility and effectiveness.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1670]

A.-M. Rickmann, F. Bongratz and C. Wachinger.
Vertex Correspondence and Self-Intersection Reduction in Cortical Surface Reconstruction.
IEEE Transactions on Medical Imaging Early Access (Apr. 2025). DOI

Abstract

Mesh-based cortical surface reconstruction is essential for neuroimaging, enabling precise measurements of brain morphology such as cortical thickness. Establishing vertex correspondence between individual cortical meshes and group templates allows vertex-level comparisons, but traditional methods require time-consuming post-processing steps to achieve vertex correspondence. While deep learning has improved accuracy in cortical surface reconstruction, optimizing vertex correspondence has not been the focus of prior work. We introduce Vox2Cortex with Correspondence (V2CC), an extension of Vox2Cortex, which replaces the commonly used Chamfer loss with L1 loss on registered surfaces. This approach improves inter- and intra-subject correspondence, which makes it suitable for direct group comparisons and atlas-based parcellation. Additionally, we analyze mesh self-intersections, categorizing them into minor (neighboring faces) and major (non-neighboring faces) types.To address major self-intersections, which are not effectively handled by standard regularization losses, we propose a novel Self-Proximity loss, designed to adjust non-neighboring vertices within a defined proximity threshold. Comprehensive evaluations demonstrate that recent deep learning methods inadequately address vertex correspondence, often causing in-accuracies in parcellation. In contrast, our method achieves accurate correspondence and reduces self-intersections to below 1% for both pial and white matter surfaces.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1669]

Z. Li, Z. Wang, X. Xu, Y. Chen and B. W. Schuller.
Unsupervised Domain-Adaptive Semantic Segmentation for Surgical Instruments Leveraging Dropout-Enhanced Dual Heads and Coarse-Grained Classification Branch.
IEEE Transactions on Medical Robotics and Bionics Early Access (Apr. 2025). DOI

Abstract

Accurate semantic segmentation for surgical instruments is crucial in robot-assisted minimally invasive surgery, mainly regarded as a core module in surgical-instrument tracking and operation guidance. Nevertheless, it is usually difficult for existing semantic surgical-instrument segmentation approaches to adapt to unknown surgical scenes, particularly due to their insufficient consideration for reducing the domain gaps across different scenes. To address this issue, we propose an unsupervised domain-adaptive semantic segmentation approach for surgical instruments, leveraging Dropout-enhanced Dual Heads and Coarse-Grained classification branch (D2HCG). The proposed approach comprises dropout-enhanced dual heads for diverse feature representation, and a coarse-grained classification branch for capturing complexities across varying granularities. This incorporates consistency loss functions targeting fine-grained features and coarse-grained granularities, aiming to reduce crossscene domain gaps. Afterwards, we perform experiments in crossscene surgical-instrument semantic segmentation cases, with the experimental results reporting the effectiveness for the proposed approach, compared with state-of-the-art semantic segmentation ones.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1668]

L. Zhu, R. Wang, X. Jin, Y. Li, F. Tian, R. Cai, K. Qian, X. Hu, B. Hu, Y. Yamamoto and B. W. Schuller.
Explainable Depression Classification Based on EEG Feature Selection from Audio Stimuli.
IEEE Transactions on Neural Systems and Rehabilitation Engineering Early Access (Apr. 2025). DOI

Abstract

With the development of affective computing and Artificial Intelligence (AI) technologies, Electroencephalogram (EEG)-based depression detection methods have been widely proposed. However, existing studies have mostly focused on the accuracy of depression recognition, ignoring the association between features and models. Additionally, there is a lack of research on the contribution of different features to depression recognition. To this end, this study introduces an innovative approach to depression detection using EEG data, integrating Ant-Lion Optimization (ALO) and Multi-Agent Reinforcement Learning (MARL) for feature fusion analysis. The inclusion of Explainable Artificial Intelligence (XAI) methods enhances the explainability of the model’s features. The Time-Delay Embedded Hidden Markov Model (TDE-HMM) is employed to infer internal brain states during depression, triggered by audio stimulation. The ALO-MARL algorithm, combined with hyper-parameter optimization of the XGBoost classifier, achieves high accuracy (93.69%), sensitivity (88.60%), specificity (97.08%), and F1-score (91.82%) on a auditory stimulus-evoked three-channel EEG dataset. The results suggest that this approach outperforms state-of-the-art feature selection methods for depression recognition on this dataset, and XAI elucidates the critical impact of the minimum value of Power Spectral Density (PSD), Sample Entropy (SampEn), and Réenyi Entropy (Ren) on depression recognition. The study also explores dynamic brain state transitions revealed by audio stimuli, providing insights for the clinical application of AI algorithms in depression recognition.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1667]

Ö. Turgut, P. Müller, P. Hager, S. Shit, S. Starck, M. Menten, E. Martens and D. Rückert.
Unlocking the diagnostic potential of electrocardiograms through information transfer from cardiac magnetic resonance imaging.
Medical Image Analysis 101.103451 (Apr. 2025). DOI GitHub

Abstract

Cardiovascular diseases (CVD) can be diagnosed using various diagnostic modalities. The electrocardiogram (ECG) is a cost-effective and widely available diagnostic aid that provides functional information of the heart. However, its ability to classify and spatially localise CVD is limited. In contrast, cardiac magnetic resonance (CMR) imaging provides detailed structural information of the heart and thus enables evidence-based diagnosis of CVD, but long scan times and high costs limit its use in clinical routine. In this work, we present a deep learning strategy for cost-effective and comprehensive cardiac screening solely from ECG. Our approach combines multimodal contrastive learning with masked data modelling to transfer domain-specific information from CMR imaging to ECG representations. In extensive experiments using data from 40,044 UK Biobank subjects, we demonstrate the utility and generalisability of our method for subject-specific risk prediction of CVD and the prediction of cardiac phenotypes using only ECG data. Specifically, our novel multimodal pre-training paradigm improves performance by up to 12.19% for risk prediction and 27.59% for phenotype prediction. In a qualitative analysis, we demonstrate that our learned ECG representations incorporate information from CMR image regions of interest.

MCML Authors

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1666]

A. Bitarafan, M. Mozafari, M. F. Azampour, M. S. Baghshah, N. Navab and A. Farshad.
Self-supervised 3D medical image segmentation by flow-guided mask propagation learning.
Medical Image Analysis 101.103478 (Apr. 2025). DOI GitHub

Abstract

Despite significant progress in 3D medical image segmentation using deep learning, manual annotation remains a labor-intensive bottleneck. Self-supervised mask propagation (SMP) methods have emerged to alleviate this challenge, allowing intra-volume segmentation with just a single slice annotation. However, the previous SMP methods often rely on 2D information and ignore volumetric contexts. While our previous work, called Vol2Flow, attempts to address this concern, it exhibits limitations, including not focusing enough on local (i.e., slice-pair) information, neglecting global information (i.e., volumetric contexts) in the objective function, and error accumulation during slice-to-slice reconstruction. This paper introduces Flow2Mask, a novel SMP method, developed to overcome the limitations of previous SMP approaches, particularly Vol2Flow. During training, Flow2Mask proposes the Local-to-Global (L2G) loss to learn inter-slice flow fields among all consecutive slices within a volume in an unsupervised manner. This dynamic loss is based on curriculum learning to gradually learn information within a volume from local to global contexts. Additionally, the Inter-Slice Smoothness (ISS) loss is introduced as a regularization term to encourage changes between the slices occur consistently and continuously. During inference, Flow2Mask leverages these 3D flow fields for inter-slice mask propagation in a 3D image, spreading annotation from a single annotated slice to the entire volume. Moreover, we propose an automatic strategy to select the most representative slice as initial annotation in the mask propagation process. Experimental evaluations on different abdominal datasets demonstrate that our proposed SMP method outperforms previous approaches and improves the overall mean DSC of Vol2Flow by +2.1%, +8.2%, and +4.0% for the Sliver, CHAOS, and 3D-IRCAD datasets, respectively. Furthermore, Flow2Mask even exhibits substantial improvements in weakly-supervised and self-supervised few-shot segmentation methods when applied as a mask completion tool.

MCML Authors

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Computer Aided Medical Procedures & Augmented Reality

[1665]

L. von der Heyde, A.-C. Haensch and A. Wenz.
Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion.
Social Science Computer Review Online First (Apr. 2025). DOI

Abstract

‘Synthetic samples’ generated by large language models (LLMs) have been argued to complement or replace traditional surveys, assuming their training data is grounded in human-generated data that potentially reflects attitudes and behaviors prevalent in the population. Initial US-based studies that have prompted LLMs to mimic survey respondents found that the responses match survey data. However, the relationship between the respective target population and LLM training data might affect the generalizability of such findings. In this paper, we critically evaluate the use of LLMs for public opinion research in a different context, by investigating whether LLMs can estimate vote choice in Germany. We generate a synthetic sample matching the 2017 German Longitudinal Election Study respondents and ask the LLM GPT-3.5 to predict each respondent’s vote choice. Comparing these predictions to the survey-based estimates on the aggregate and subgroup levels, we find that GPT-3.5 exhibits a bias towards the Green and Left parties. While the LLM predictions capture the tendencies of “typical” voters, they miss more complex factors of vote choice. By examining the LLM-based prediction of voting behavior in a non-English speaking context, our study contributes to research on the extent to which LLMs can be leveraged for studying public opinion. The findings point to disparities in opinion representation in LLMs and underscore the limitations in applying them for public opinion estimation.

MCML Authors

Leah von der Heyde

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[1664]

N. Santhanam, H. E. Kim, D. Rügamer, A. Bender, S. Muthers, C. G. Cho, A. Alonso, K. Szabo, F.-S. Centner, H. Wenz, T. Ganslandt, M. Platten, C. Groden, M. Neumaier, F. Siegel and M. E. Maros.
Machine learning-based forecasting of daily acute ischemic stroke admissions using weather data.
npj Digital Medicine 8.225 (Apr. 2025). DOI

Abstract

Background: In the midst of the emerging climate crisis, healthcare providers lack locally validated, disease-specific surveillance models. Stroke, a significant contributor to the global disease burden, has been linked to climate change. Therefore, we developed and benchmarked machine learning (ML) models based on locoregional weather systems to forecast the number of daily acute ischemic stroke (AIS) admissions.
Methods: AIS patients diagnosed between 2015 and 2021 at the tertiary University Medical Center (UMC) Mannheim, Germany were extracted from the local data integration center and geospatially matched to weather data from the German Weather Service (DWD) based on the clinic’s, patients’ home and closest tower’s locations at the time of admission. Statistical-(Poisson), boosted generalized additive model (GAM), support vector machines (SVR), and tree-based models including random forest (RF) and extreme gradient boosting (XGB) were evaluated in regression settings within time-stratified nested cross-validation setup (training-validation: 2015-2020, test set: 2021) to predict the number of daily AIS admissions.
Findings: The cohort included 7,914 AIS patients (4,244 male, 53·6%). XGB showed the best test performance with lowest mean absolute error (MAE) of 1·21 cases/day. Maximum air pressure was identified as the top predictive variable. Shapley additive explanations analyses revealed that temperature extremes of extended cold- (lag-3 minimum temperature <-2 °C; minimum perceived temperature <-1·4 °C) and hot stressors (lag-7 minimum temperature >15 °C), as well as stormy conditions (lag-1 and lag-2 maximum wind gust >14 m/s and speed >10·4 m/s), increased stroke incidences substantially with distinct seasonal associations.
Interpretation: ML models can sufficiently forecast AIS admissions based on weather patterns allowing for improved resource allocation and preparedness.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[1663]

C. Bülte, Y. Sale, T. Löhr, P. Hofman, G. Kutyniok and E. Hüllermeier.
An Axiomatic Assessment of Entropy- and Variance-based Uncertainty Quantification in Regression.
Preprint (Apr. 2025). arXiv

Abstract

Uncertainty quantification (UQ) is crucial in machine learning, yet most (axiomatic) studies of uncertainty measures focus on classification, leaving a gap in regression settings with limited formal justification and evaluations. In this work, we introduce a set of axioms to rigorously assess measures of aleatoric, epistemic, and total uncertainty in supervised regression. By utilizing a predictive exponential family, we can generalize commonly used approaches for uncertainty representation and corresponding uncertainty measures. More specifically, we analyze the widely used entropy- and variance-based measures regarding limitations and challenges. Our findings provide a principled foundation for UQ in regression, offering theoretical insights and practical guidelines for reliable uncertainty assessment.

MCML Authors

Christopher Bülte

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Paul Hofman

Artificial Intelligence and Machine Learning

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

Eyke Hüllermeier

Prof. Dr.

B3 | Multimodal Perception
→ Group Stefan Leutenegger

Artificial Intelligence and Machine Learning

[1662]

Y. Burkhardt, S. Schaefer and S. Leutenegger.
SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection.
Preprint (Apr. 2025). arXiv GitHub

Abstract

Event-based keypoint detection and matching holds significant potential, enabling the integration of event sensors into highly optimized Visual SLAM systems developed for frame cameras over decades of research. Unfortunately, existing approaches struggle with the motion-dependent appearance of keypoints and the complex noise prevalent in event streams, resulting in severely limited feature matching capabilities and poor performance on downstream tasks. To mitigate this problem, we propose SuperEvent, a data-driven approach to predict stable keypoints with expressive descriptors. Due to the absence of event datasets with ground truth keypoint labels, we leverage existing frame-based keypoint detectors on readily available event-aligned and synchronized gray-scale frames for self-supervision: we generate temporally sparse keypoint pseudo-labels considering that events are a product of both scene appearance and camera motion. Combined with our novel, information-rich event representation, we enable SuperEvent to effectively learn robust keypoint detection and description in event streams. Finally, we demonstrate the usefulness of SuperEvent by its integration into a modern sparse keypoint and descriptor-based SLAM framework originally developed for traditional cameras, surpassing the state-of-the-art in event-based SLAM by a wide margin.

MCML Authors

Yannick Burkhardt

Machine Learning for Robotics

Simon Schaefer

B3 | Multimodal Perception
→ Group Stefan Leutenegger

Machine Learning for Robotics

Stefan Leutenegger

Prof. Dr.

Machine Learning for Robotics

[1661]

W. Chen, G. Zhang, F. Wimbauer, R. Wang, N. Araslanov, A. Vedaldi and D. Cremers.
Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction.
Preprint (Apr. 2025). arXiv

Abstract

Traditional SLAM systems, which rely on bundle adjustment, struggle with highly dynamic scenes commonly found in casual videos. Such videos entangle the motion of dynamic elements, undermining the assumption of static environments required by traditional systems. Existing techniques either filter out dynamic elements or model their motion independently. However, the former often results in incomplete reconstructions, whereas the latter can lead to inconsistent motion estimates. Taking a novel approach, this work leverages a 3D point tracker to separate the camera-induced motion from the observed motion of dynamic objects. By considering only the camera-induced component, bundle adjustment can operate reliably on all scene elements as a result. We further ensure depth consistency across video frames with lightweight post-processing based on scale maps. Our framework combines the core of traditional SLAM – bundle adjustment – with a robust learning-based 3D tracker front-end. Integrating motion decomposition, bundle adjustment and depth refinement, our unified framework, BA-Track, accurately tracks the camera motion and produces temporally coherent and scale-consistent dense reconstructions, accommodating both static and dynamic elements. Our experiments on challenging datasets reveal significant improvements in camera pose estimation and 3D reconstruction accuracy.

MCML Authors

Weirong Chen

Computer Vision & Artificial Intelligence

Ganlin Zhang

Computer Vision & Artificial Intelligence

Felix Wimbauer

Computer Vision & Artificial Intelligence

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1660]

L. Fichtel, M. Spliethöver, E. Hüllermeier, P. Jimenez, N. Klowait, S. Kopp, A.-C. N. Ngomo, A. Robrecht, I. Scharlau, L. Terfloth, A.-L. Vollmer and H. Wachsmuth.
Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues.
Preprint (Apr. 2025). arXiv

Abstract

The ability to generate explanations that are understood by explainees is the quintessence of explainable artificial intelligence. Since understanding depends on the explainee’s background and needs, recent research has focused on co-constructive explanation dialogues, where the explainer continuously monitors the explainee’s understanding and adapts explanations dynamically. We investigate the ability of large language models (LLMs) to engage as explainers in co-constructive explanation dialogues. In particular, we present a user study in which explainees interact with LLMs, of which some have been instructed to explain a predefined topic co-constructively. We evaluate the explainees’ understanding before and after the dialogue, as well as their perception of the LLMs’ co-constructive behavior. Our results indicate that current LLMs show some co-constructive behaviors, such as asking verification questions, that foster the explainees’ engagement and can improve understanding of a topic. However, their ability to effectively monitor the current understanding and scaffold the explanations accordingly remains limited.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1659]

D. Kotovenko, O. Grebenkova and B. Ommer.
EDGS: Eliminating Densification for Efficient Convergence of 3DGS.
Preprint (Apr. 2025). arXiv

Abstract

3D Gaussian Splatting reconstructs scenes by starting from a sparse Structure-from-Motion initialization and iteratively refining under-reconstructed regions. This process is inherently slow, as it requires multiple densification steps where Gaussians are repeatedly split and adjusted, following a lengthy optimization path. Moreover, this incremental approach often leads to suboptimal renderings, particularly in high-frequency regions where detail is critical. We propose a fundamentally different approach: we eliminate densification process with a one-step approximation of scene geometry using triangulated pixels from dense image correspondences. This dense initialization allows us to estimate rough geometry of the scene while preserving rich details from input RGB images, providing each Gaussian with well-informed colors, scales, and positions. As a result, we dramatically shorten the optimization path and remove the need for densification. Unlike traditional methods that rely on sparse keypoints, our dense initialization ensures uniform detail across the scene, even in high-frequency regions where 3DGS and other methods struggle. Moreover, since all splats are initialized in parallel at the start of optimization, we eliminate the need to wait for densification to adjust new Gaussians. Our method not only outperforms speed-optimized models in training efficiency but also achieves higher rendering quality than state-of-the-art approaches, all while using only half the splats of standard 3DGS. It is fully compatible with other 3DGS acceleration techniques, making it a versatile and efficient solution that can be integrated with existing approaches.

MCML Authors

Olga Grebenkova

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1658]

D. Martens, G. Shmueli, T. Evgeniou, K. Bauer, C. Janiesch, S. Feuerriegel, S. Gabel, S. Goethals, T. Greene, N. Klein, M. Kraus, N. Kühl, C. Perlich, W. Verbeke, A. Zharova, P. Zschech and F. Provost.
Beware of 'Explanations' of AI.
Preprint (Apr. 2025). arXiv

Abstract

Understanding the decisions made and actions taken by increasingly complex AI system remains a key challenge. This has led to an expanding field of research in explainable artificial intelligence (XAI), highlighting the potential of explanations to enhance trust, support adoption, and meet regulatory standards. However, the question of what constitutes a ‘good’ explanation is dependent on the goals, stakeholders, and context. At a high level, psychological insights such as the concept of mental model alignment can offer guidance, but success in practice is challenging due to social and technical factors. As a result of this ill-defined nature of the problem, explanations can be of poor quality (e.g. unfaithful, irrelevant, or incoherent), potentially leading to substantial risks. Instead of fostering trust and safety, poorly designed explanations can actually cause harm, including wrong decisions, privacy violations, manipulation, and even reduced AI adoption. Therefore, we caution stakeholders to beware of explanations of AI: while they can be vital, they are not automatically a remedy for transparency or responsible AI adoption, and their misuse or limitations can exacerbate harm. Attention to these caveats can help guide future research to improve the quality and impact of AI explanations.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Artificial Intelligence in Management

[1657]

P. Mondorf, S. Zhou, M. Riedler and B. Plank.
Enabling Systematic Generalization in Abstract Spatial Reasoning through Meta-Learning for Compositionality.
Preprint (Apr. 2025). arXiv

Abstract

Systematic generalization refers to the capacity to understand and generate novel combinations from known components. Despite recent progress by large language models (LLMs) across various domains, these models often fail to extend their knowledge to novel compositional scenarios, revealing notable limitations in systematic generalization. There has been an ongoing debate about whether neural networks possess the capacity for systematic generalization, with recent studies suggesting that meta-learning approaches designed for compositionality can significantly enhance this ability. However, these insights have largely been confined to linguistic problems, leaving their applicability to other tasks an open question. In this study, we extend the approach of meta-learning for compositionality to the domain of abstract spatial reasoning. To this end, we introduce SYGAR-a dataset designed to evaluate the capacity of models to systematically generalize from known geometric transformations (e.g., translation, rotation) of two-dimensional objects to novel combinations of these transformations (e.g., translation+rotation). Our results show that a transformer-based encoder-decoder model, trained via meta-learning for compositionality, can systematically generalize to previously unseen transformation compositions, significantly outperforming state-of-the-art LLMs, including o3-mini, GPT-4o, and Gemini 2.0 Flash, which fail to exhibit similar systematic behavior. Our findings highlight the effectiveness of meta-learning in promoting systematicity beyond linguistic tasks, suggesting a promising direction toward more robust and generalizable models.

MCML Authors

Philipp Mondorf

AI and Computational Linguistics

Shijia Zhou

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Monica Riedler

B1 | Computer Vision
→ Group Almut Sophia Koepke

Computer Vision & Artificial Intelligence

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[1656]

E. Özeren, Y. Liu and H. Schütze.
HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization.
Preprint (Apr. 2025). arXiv

Abstract

Many pre-trained language models (PLMs) exhibit suboptimal performance on mid- and low-resource languages, largely due to limited exposure to these languages during pre-training. A common strategy to address this is to introduce new tokens specific to the target languages, initialize their embeddings, and apply continual pre-training on target-language data. Among such methods, OFA (Liu et al., 2024a) proposes a similarity-based subword embedding initialization heuristic that is both effective and efficient. However, OFA restricts target-language token embeddings to be convex combinations of a fixed number of source-language embeddings, which may limit expressiveness. To overcome this limitation, we propose HYPEROFA, a hypernetwork-based approach for more adaptive token embedding initialization. The hypernetwork is trained to map from an external multilingual word vector space to the PLMs token embedding space using source-language tokens. Once trained, it can generate flexible embeddings for target-language tokens, serving as a good starting point for continual pretraining. Experiments demonstrate that HYPEROFA consistently outperforms random initialization baseline and matches or exceeds the performance of OFA in both continual pre-training convergence and downstream task performance. We make the code publicly available.

MCML Authors

Yihong Liu

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1655]

M. Pach, S. Karthik, Q. Bouniot, S. Belongie and Z. Akata.
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models.
Preprint (Apr. 2025). arXiv

Abstract

Sparse Autoencoders (SAEs) have recently been shown to enhance interpretability and steerability in Large Language Models (LLMs). In this work, we extend the application of SAEs to Vision-Language Models (VLMs), such as CLIP, and introduce a comprehensive framework for evaluating monosemanticity in vision representations. Our experimental results reveal that SAEs trained on VLMs significantly enhance the monosemanticity of individual neurons while also exhibiting hierarchical representations that align well with expert-defined structures (e.g., iNaturalist taxonomy). Most notably, we demonstrate that applying SAEs to intervene on a CLIP vision encoder, directly steer output from multimodal LLMs (e.g., LLaVA) without any modifications to the underlying model. These findings emphasize the practicality and efficacy of SAEs as an unsupervised approach for enhancing both the interpretability and control of VLMs.

MCML Authors

Mateusz Pach

Interpretable and Reliable Machine Learning

Shyamgopal Karthik

Interpretable and Reliable Machine Learning

Quentin Bouniot

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1654]

N. Röhrich, A. Hoffmann, R. Nordsieck, E. Zarbali and A. Javanmardi.
Masked Autoencoder Self Pre-Training for Defect Detection in Microelectronics.
Preprint (Apr. 2025). arXiv

Abstract

Whereas in general computer vision, transformer-based architectures have quickly become the gold standard, microelectronics defect detection still heavily relies on convolutional neural networks (CNNs). We hypothesize that this is due to the fact that a) transformers have an increased need for data and b) labelled image generation procedures for microelectronics are costly, and labelled data is therefore sparse. Whereas in other domains, pre-training on large natural image datasets can mitigate this problem, in microelectronics transfer learning is hindered due to the dissimilarity of domain data and natural images. Therefore, we evaluate self pre-training, where models are pre-trained on the target dataset, rather than another dataset. We propose a vision transformer (ViT) pre-training framework for defect detection in microelectronics based on masked autoencoders (MAE). In MAE, a large share of image patches is masked and reconstructed by the model during pre-training. We perform pre-training and defect detection using a dataset of less than 10.000 scanning acoustic microscopy (SAM) images labelled using transient thermal analysis (TTA). Our experimental results show that our approach leads to substantial performance gains compared to a) supervised ViT, b) ViT pre-trained on natural image datasets, and c) state-of-the-art CNN-based defect detection models used in the literature. Additionally, interpretability analysis reveals that our self pre-trained models, in comparison to ViT baselines, correctly focus on defect-relevant features such as cracks in the solder material. This demonstrates that our approach yields fault-specific feature representations, making our self pre-trained models viable for real-world defect detection in microelectronics.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

[1653]

L. Sang, Z. Canfes, D. Cao, R. Marin, F. Bernard and D. Cremers.
TwoSquared: 4D Generation from 2D Image Pairs.
Preprint (Apr. 2025). arXiv

Abstract

Despite the astonishing progress in generative AI, 4D dynamic object generation remains an open challenge. With limited high-quality training data and heavy computing requirements, the combination of hallucinating unseen geometry together with unseen movement poses great challenges to generative models. In this work, we propose TwoSquared as a method to obtain a 4D physically plausible sequence starting from only two 2D RGB images corresponding to the beginning and end of the action. Instead of directly solving the 4D generation problem, TwoSquared decomposes the problem into two steps: 1) an image-to-3D module generation based on the existing generative model trained on high-quality 3D assets, and 2) a physically inspired deformation module to predict intermediate movements. To this end, our method does not require templates or object-class-specific prior knowledge and can take in-the-wild images as input. In our experiments, we demonstrate that TwoSquared is capable of producing texture-consistent and geometry-consistent 4D sequences only given 2D images.

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Riccardo Marin

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Computer Vision & Artificial Intelligence

[1652]

C. Sauer, F. J. D. Lange, M. Thurow, I. Dormuth and A.-L. Boulesteix.
Statistical parametric simulation studies based on real data.
Preprint (Apr. 2025). arXiv

Abstract

Simulation studies are indispensable for evaluating and comparing statistical methods. The most common simulation approach is parametric simulation, where the data-generating mechanism (DGM) corresponds to a predefined parametric model from which observations are drawn. Many statistical simulation studies aim to provide practical recommendations on a method’s suitability for a given application; however, parametric simulations in particular are frequently criticized for being too simplistic and not reflecting reality. To overcome this drawback, it is generally considered a sensible approach to employ real data for constructing the parametric DGMs. However, while the concept of real-data-based parametric DGMs is widely recognized, the specific ways in which DGM components are inferred from real data vary, and their implications may not always be well understood. Additionally, researchers often rely on a limited selection of real datasets, with the rationale for their selection often unclear. This paper addresses these issues by formally discussing how components of parametric DGMs can be inferred from real data and how dataset selection can be performed more systematically. By doing so, we aim to support researchers in conducting simulation studies with a lower risk of overgeneralization and misinterpretation. We illustrate the construction of parametric DGMs based on a systematically selected set of real datasets using two examples: one on ordinal outcomes in randomized controlled trials and one on differential gene expression analysis.

MCML Authors

Christina Sauer (née Nießl)

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[1651]

M. Scherbela, N. Gao, P. Grohs and S. Günnemann.
Accurate Ab-initio Neural-network Solutions to Large-Scale Electronic Structure Problems.
Preprint (Apr. 2025). arXiv

Abstract

We present finite-range embeddings (FiRE), a novel wave function ansatz for accurate large-scale ab-initio electronic structure calculations. Compared to contemporary neural-network wave functions, FiRE reduces the asymptotic complexity of neural-network variational Monte Carlo (NN-VMC) by ∼nel, the number of electrons. By restricting electron-electron interactions within the neural network, FiRE accelerates all key operations – sampling, pseudopotentials, and Laplacian computations – resulting in a real-world 10× acceleration in now-feasible 180-electron calculations. We validate our method’s accuracy on various challenging systems, including biochemical compounds, conjugated hydrocarbons, and organometallic compounds. On these systems, FiRE’s energies are consistently within chemical accuracy of the most reliable data, including experiments, even in cases where high-accuracy methods such as CCSD(T), AFQMC, or contemporary NN-VMC fall short. With these improvements in both runtime and accuracy, FiRE represents a new `gold-standard’ method for fast and accurate large-scale ab-initio calculations, potentially enabling new computational studies in fields like quantum chemistry, solid-state physics, and material design.

MCML Authors

Stephan Günnemann

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Data Analytics & Machine Learning

[1650]

F. Weindel and R. Heckel.
LLM-Guided Search for Deletion-Correcting Codes.
Preprint (Apr. 2025). arXiv

Abstract

Finding deletion-correcting codes of maximum size has been an open problem for over 70 years, even for a single deletion. In this paper, we propose a novel approach for constructing deletion-correcting codes. A code is a set of sequences satisfying certain constraints, and we construct it by greedily adding the highest-priority sequence according to a priority function. To find good priority functions, we leverage FunSearch, a large language model (LLM)-guided evolutionary search proposed by Romera et al., 2024. FunSearch iteratively generates, evaluates, and refines priority functions to construct large deletion-correcting codes. For a single deletion, our evolutionary search finds functions that construct codes which match known maximum sizes, reach the size of the largest (conjectured optimal) Varshamov-Tenengolts codes where the maximum is unknown, and independently rediscover them in equivalent form. For two deletions, we find functions that construct codes with new best-known sizes for code lengths ( n = 12, 13 ), and ( 16 ), establishing improved lower bounds. These results demonstrate the potential of LLM-guided search for information theory and code design and represent the first application of such methods for constructing error-correcting codes.

MCML Authors

Franziska Weindel

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

[1649]

W. Yuan, Q. Khan and V. Golkov.
Generation of Musical Timbres using a Text-Guided Diffusion Model.
Preprint (Apr. 2025). arXiv GitHub

Abstract

In recent years, text-to-audio systems have achieved remarkable success, enabling the generation of complete audio segments directly from text descriptions. While these systems also facilitate music creation, the element of human creativity and deliberate expression is often limited. In contrast, the present work allows composers, arrangers, and performers to create the basic building blocks for music creation: audio of individual musical notes for use in electronic instruments and DAWs. Through text prompts, the user can specify the timbre characteristics of the audio. We introduce a system that combines a latent diffusion model and multi-modal contrastive learning to generate musical timbres conditioned on text descriptions. By jointly generating the magnitude and phase of the spectrogram, our method eliminates the need for subsequently running a phase retrieval algorithm, as related methods do.

MCML Authors

Qadeer Khan

Computer Vision & Artificial Intelligence

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

[1648]

T. Zehle, M. Schlager, T. Heiß and M. Feurer.
CAPO: Cost-Aware Prompt Optimization.
Preprint (Apr. 2025). arXiv

Abstract

Large language models (LLMs) have revolutionized natural language processing by solving a wide range of tasks simply guided by a prompt. Yet their performance is highly sensitive to prompt formulation. While automated prompt optimization addresses this challenge by finding optimal prompts, current methods require a substantial number of LLM calls and input tokens, making prompt optimization expensive. We introduce CAPO (Cost-Aware Prompt Optimization), an algorithm that enhances prompt optimization efficiency by integrating AutoML techniques. CAPO is an evolutionary approach with LLMs as operators, incorporating racing to save evaluations and multi-objective optimization to balance performance with prompt length. It jointly optimizes instructions and few-shot examples while leveraging task descriptions for improved robustness. Our extensive experiments across diverse datasets and LLMs demonstrate that CAPO outperforms state-of-the-art discrete prompt optimization methods in 11/15 cases with improvements up to 21%p. Our algorithm achieves better performances already with smaller budgets, saves evaluations through racing, and decreases average prompt length via a length penalty, making it both cost-efficient and cost-aware. Even without few-shot examples, CAPO outperforms its competitors and generally remains robust to initial prompts. CAPO represents an important step toward making prompt optimization more powerful and accessible by improving cost-efficiency.

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

[1647]

Y.-J. Li, M. Gladkova, Y. Xia, R. Wang and D. Cremers.
VXP: Voxel-Cross-Pixel Large-Scale Camera-LiDAR Place Recognition.
3DV 2025 - 12th International Conference on 3D Vision. Singapore, Mar 25-28, 2025. To be published. Preprint available. arXiv

Abstract

Recent works on the global place recognition treat the task as a retrieval problem, where an off-the-shelf global descriptor is commonly designed in image-based and LiDAR-based modalities. However, it is non-trivial to perform accurate image-LiDAR global place recognition since extracting consistent and robust global descriptors from different domains (2D images and 3D point clouds) is challenging. To address this issue, we propose a novel Voxel-Cross-Pixel (VXP) approach, which establishes voxel and pixel correspondences in a self-supervised manner and brings them into a shared feature space. Specifically, VXP is trained in a two-stage manner that first explicitly exploits local feature correspondences and enforces similarity of global descriptors. Extensive experiments on the three benchmarks (Oxford RobotCar, ViViD++ and KITTI) demonstrate our method surpasses the state-of-the-art cross-modal retrieval by a large margin.

MCML Authors

Mariia Gladkova

Computer Vision & Artificial Intelligence

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1646]

J. Seidenschwarz, Q. Zhou, B. Duisterhof, D. Ramanan and L. Leal-Taixé.
DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction.
3DV 2025 - 12th International Conference on 3D Vision. Singapore, Mar 25-28, 2025. To be published. Preprint available. arXiv

Abstract

Reconstructing scenes and tracking motion are two sides of the same coin. Tracking points allow for geometric reconstruction [14], while geometric reconstruction of (dynamic) scenes allows for 3D tracking of points over time [24, 39]. The latter was recently also exploited for 2D point tracking to overcome occlusion ambiguities by lifting tracking directly into 3D [38]. However, above approaches either require offline processing or multi-view camera setups both unrealistic for real-world applications like robot navigation or mixed reality. We target the challenge of online 2D and 3D point tracking from unposed monocular camera input introducing Dynamic Online Monocular Reconstruction (DynOMo). We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion. Our approach extends 3D Gaussians to capture new content and object motions while estimating camera movements from a single RGB frame. DynOMo stands out by enabling emergence of point trajectories through robust image feature reconstruction and a novel similarity-enhanced regularization term, without requiring any correspondence-level supervision. It sets the first baseline for online point tracking with monocular unposed cameras, achieving performance on par with existing methods. We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.

MCML Authors

Jenny Seidenschwarz

Computer Vision & Artificial Intelligence

[1645]

H. Zeng, M. Gao and D. Cremers.
CoE: Deep Coupled Embedding for Non-Rigid Point Cloud Correspondences.
3DV 2025 - 12th International Conference on 3D Vision. Singapore, Mar 25-28, 2025. To be published. Preprint available. arXiv

Abstract

The interest in matching non-rigidly deformed shapes represented as raw point clouds is rising due to the proliferation of low-cost 3D sensors. Yet, the task is challenging since point clouds are irregular and there is a lack of intrinsic shape information. We propose to tackle these challenges by learning a new shape representation – a per-point high dimensional embedding, in an embedding space where semantically similar points share similar embeddings. The learned embedding has multiple beneficial properties: it is aware of the underlying shape geometry and is robust to shape deformations and various shape artefacts, such as noise and partiality. Consequently, this embedding can be directly employed to retrieve high-quality dense correspondences through a simple nearest neighbor search in the embedding space. Extensive experiments demonstrate new state-of-the-art results and robustness in numerous challenging non-rigid shape matching benchmarks and show its great potential in other shape analysis tasks, such as segmentation.

MCML Authors

Maolin Gao

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1644]

G. Zhai, E. P. Örnek, D. Z. Chen, R. Liao, Y. Di, N. Navab, F. Tombari and B. Busam.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.
3DV 2025 - 12th International Conference on 3D Vision. Singapore, Mar 25-28, 2025. To be published. Preprint available. arXiv

Abstract

We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Code and trained models are open-sourced.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Ruotong Liao

Database Systems and Data Mining

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[1643]

L. Zumeta-Olaskoaga, A. Bender and D.-J. Lee.
Flexible modelling of time-varying exposures in event history analysis.
DAGStat 2025 - 7th Joint Statistical Meeting of the Deutsche Arbeitsgemeinschaft Statistik. Berlin, Germany, Mar 24-28, 2025. Poster presentation. Full paper available. DOI

Abstract

We present a flexible modelling approach to analyse time-varying exposures and recurrent events in team sports injuries. The approach is based on the piece-wise exponential additive mixed model where the effects of past exposures (i.e. high-intensity training loads) may accumulate over time and present complex forms of association. In order to identify a relevant time window at which past exposures have an impact on the current risk, we propose a penalty approach. We conduct a simulation study to evaluate the performance of the proposed model, under different true weight functions and different levels of heterogeneity between recurrent events. Finally, we illustrate the approach with a case study application involving an elite male football team participating in the Spanish LaLiga competition. The cohort includes time-loss injuries and external training load variables tracked by Global Positioning System devices, during the seasons 2017–2018 and 2018–2019.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[1642]

L. Bothmann, S. Dandl, J. M. A. Jose M. Alvarez, P. A. Boustani and B. Bischl.
Privilege Scores for Fairness-Aware ML.
DAGStat 2025 - 7th Joint Statistical Meeting of the Deutsche Arbeitsgemeinschaft Statistik. Berlin, Germany, Mar 24-28, 2025. Poster presentation. Preprint available. arXiv

Abstract

Bias-preserving methods in fairness-aware machine learning (fairML) focus on metrics that prioritize formal equality by balancing error rates across subgroups. These methods can perpetuate historical discrimination embedded in real-world data. In contrast, bias-transforming methods aim for substantive equality by actively addressing historical inequalities. As a contribution to bias-transforming methods, we introduce the concept of privilege scores, a novel approach to identifying and quantifying individual privilege in machine learning tasks. Privilege scores use causal inference techniques to compare real-world outcomes to those in a ‘fair’ world in which the protected attributes do not influence the target variable. This individual-level perspective provides actionable insights for applications such as affirmative action and beyond. Key contributions include (1) the formalization of privilege scores, (2) a methodological framework for estimation with uncertainty quantification via confidence intervals, (3) an interpretable machine learning approach for understanding privilege score contributions, and (4) a novel in-processing method, Multi-PrivScore, to mitigate model-level discrimination during model training. Experiments on simulated and real-world data demonstrate the usefulness of privilege scores. Overall, our work highlights privilege scores as a versatile tool for assessing and mitigating historical discrimination in various machine learning applications.

MCML Authors

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Philip Amir Boustani

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[1641]

Y. Shen.
Probabilistic Modeling and Uncertainty Awareness in Deep learning.
Dissertation 2025. URL

Abstract

This dissertation focuses on probabilistic modeling and uncertainty-aware approaches for deep learning. It is based on four papers that tackle the problem of uncertainty-aware deep learning, covering techniques such as post-hoc calibration, model aggregation, and Bayesian deep learning with variational inference. Also, an overview of related prior work is provided, which covers both classical and deep-learning-based approaches.

MCML Authors

Yuesong Shen

B2 | Natural Language Processing
→ Group Alexander Fraser

* Former Member

[1640]

S. Okabe and A. Fraser.
Bilingual Sentence Mining for Low-Resource Languages: a Case Study on Upper and Lower Sorbian.
Compute-EL @ICLDC 2025 - 8th Workshop on The Use of Computational Methods in the Study of Endangered Languages at the 9th International Conference on Language Documentation and Conservation (ICLDC 2025). Honolulu, Hawaii, USA, Mar 06-06, 2025. To be published. Preprint available. URL

Abstract

Parallel sentence mining is crucial for downstream tasks such as Machine Translation, especially for low-resource languages, where such resources are scarce. In this context, we apply a pipeline approach with contextual embeddings on two endangered Slavic languages spoken in Germany, Upper and Lower Sorbian, to evaluate mining quality. To this end, we compare off-the-shelf multilingual language models and word encoders pre-trained on Upper Sorbian to understand their impact on sentence mining. Moreover, to filter out irrelevant pairs, we experiment with a post-processing of mined sentences through an unsupervised word aligner based on word embeddings. We observe the usefulness of additional pre-training in Upper Sorbian, which leads to direct improvements when mining the same language but also its related language, Lower Sorbian.

MCML Authors

Shu Okabe

Dr.

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[1639]

P. T. da Silva, A. Karollus, J. Hingerl, G. Galindez, N. Wagner, X. Hernandez-Alias, D. Incarnato and J. Gagneur.
Nucleotide dependency analysis of DNA language models reveals genomic functional elements.
CSHL 2025 - 5th Cold Spring Harbor conference on Probabilistic Modeling in Genomics. Cold Spring Harbor Laboratory, New York, USA, Mar 05-08, 2025. DOI URL

Abstract

Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its sequence context. However, using DNA LMs for discovering functional genomic elements has been challenging due to the lack of interpretable methods. Here, we introduce nucleotide dependencies which quantify how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions. We generated genome-wide maps of pairwise nucleotide dependencies within kilobase ranges for animal, fungal, and bacterial species. We show that nucleotide dependencies indicate deleteriousness of human genetic variants more effectively than sequence alignment and DNA LM reconstruction. Regulatory elements appear as dense blocks in dependency maps, enabling the systematic identification of transcription factor binding sites as accurately as models trained on experimental binding data. Nucleotide dependencies also highlight bases in contact within RNA structures, including pseudoknots and tertiary structure contacts, with remarkable accuracy. This led to the discovery of four novel, experimentally validated RNA structures in Escherichia coli. Finally, using dependency maps, we reveal critical limitations of several DNA LM architectures and training sequence selection strategies by benchmarking and visual diagnosis. Altogether, nucleotide dependency analysis opens a new avenue for discovering and studying functional elements and their interactions in genomes.Competing Interest StatementThe authors have declared no competing interest.

MCML Authors

Pedro Tomaz da Silva

Computational Molecular Medicine

Alexander Karollus

Computational Molecular Medicine

Johannes Hingerl

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

[1638]

Z. Yuan, Z. Xiong, L. Mou and X. Zhu.
ChatEarthNet: a global-scale image–text dataset empowering vision–language geo-foundation models.
Earth System Science Data 17.3 (Mar. 2025). DOI

Abstract

The rapid development of remote sensing technology has led to an exponential growth in satellite images, yet their inherent complexity often makes them difficult for non-expert users to understand. Natural language, as a carrier of human knowledge, can bridge the gap between common users and complicated satellite imagery. Additionally, when paired with visual data, natural language can be utilized to train large vision–language foundation models, significantly improving performance in various tasks. Despite these advancements, the remote sensing community still faces a challenge due to the lack of large-scale, high-quality vision–language datasets for satellite images. To address this challenge, we introduce a new image–text dataset, providing high-quality natural language descriptions for global-scale satellite data. Specifically, we utilize Sentinel-2 data for its global coverage as the foundational image source, employing semantic segmentation labels from the European Space Agency’s WorldCover project to enrich the descriptions of land cover types. By conducting in-depth semantic analysis, we formulate detailed prompts to elicit rich descriptions from ChatGPT. We then include a manual verification process to enhance the dataset’s quality further. This step involves manual inspection and correction to refine the dataset. Finally, we offer the community ChatEarthNet, a large-scale image–text dataset characterized by global coverage, high quality, wide-ranging diversity, and detailed descriptions. ChatEarthNet consists of 163 488 image–text pairs with captions generated by ChatGPT-3.5 and an additional 10 000 image–text pairs with captions generated by ChatGPT-4V(ision). This dataset has significant potential for both training and evaluating vision–language geo-foundation models for remote sensing. The code is publicly available at https://doi.org/10.5281/zenodo.11004358 (Yuan et al., 2024b), and the ChatEarthNet dataset is available at https://doi.org/10.5281/zenodo.11003436 (Yuan et al., 2024c).

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1637]

F. Krahmer and A. Veselovska.
The mathematics of dots and pixels: On the theoretical foundations of image halftoning.
GAMM Mitteilungen 48.1 (Mar. 2025). DOI

Abstract

The evolution of image halftoning, from its analog roots to contemporary digital methodologies, encapsulates a fascinating journey marked by technological advancements and creative innovations. Yet the theoretical understanding of halftoning is much more recent. In this article, we explore various approaches towards shedding light on the design of halftoning approaches and why they work. We discuss both halftoning in a continuous domain and on a pixel grid. We start by reviewing the mathematical foundation of the so-called electrostatic halftoning method, which departed from the heuristic of considering the back dots of the halftoned image as charged particles attracted by the grey values of the image in combination with mutual repulsion. Such an attraction-repulsion model can be mathematically represented via an energy functional in a reproducing kernel Hilbert space allowing for a rigorous analysis of the resulting optimization problem as well as a convergence analysis in a suitable topology. A second class of methods that we discuss in detail is the class of error diffusion schemes, arguably among the most popular halftoning techniques due to their ability to work directly on a pixel grid and their ease of application. The main idea of these schemes is to choose the locations of the black pixels via a recurrence relation designed to agree with the image in terms of the local averages. We discuss some recent mathematical understanding of these methods that is based on a connection to Σ∆ quantizers, a popular class of algorithms for analog-to-digital conversion.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

Hanna Veselovska

Dr.

Applied Numerical Analysis

[1636]

S. Garske, K. Heidler, B. Evans, K. Wong and X. Zhu.
SHAZAM: Self-Supervised Change Monitoring for Hazard Detection and Mapping.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Early Access (Mar. 2025). DOI GitHub

Abstract

The increasing frequency of environmental hazards due to climate change underscores the urgent need for effective monitoring systems. Current approaches either rely on expensive labelled datasets, struggle with seasonal variations, or require multiple observations for confirmation (which delays detection). To address these challenges, this work presents SHAZAM - Self-Supervised Change Monitoring for Hazard Detection and Mapping. SHAZAM uses a lightweight conditional UNet to generate expected images of a region of interest (ROI) for any day of the year, allowing for the direct modelling of normal seasonal changes and the ability to distinguish potential hazards. A modified structural similarity measure compares the generated images with actual satellite observations to compute region-level anomaly scores and pixel-level hazard maps. Additionally, a theoretically grounded seasonal threshold eliminates the need for dataset-specific optimisation. Evaluated on four diverse datasets that contain bushfires (wildfires), burned regions, extreme and out-of-season snowfall, floods, droughts, algal blooms, and deforestation, SHAZAM achieved F1 score improvements of between 0.066 and 0.234 over existing methods. This was achieved primarily through more effective hazard detection (higher recall) while using only 473K parameters. SHAZAM demonstrated superior mapping capabilities through higher spatial resolution and improved ability to suppress background features while accentuating both immediate and gradual hazards. SHAZAM has been established as an effective and generalisable solution for hazard detection and mapping across different geographical regions and a diverse range of hazards.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1635]

J. Xie, Y. Wang, X. Qian, J. Zhang and B. W. Schuller.
Improving Bird Vocalization Recognition in Open-Set Cross-Corpus Scenarios with Semantic Feature Reconstruction and Dual Strategy Scoring.
IEEE Signal Processing Letters 32 (Mar. 2025). DOI

Abstract

Automated recognition of bird vocalizations (BVs) is essential for biodiversity monitoring through passive acoustic monitoring (PAM), yet deep learning (DL) models encounter substantial challenges in open environments. These include difficulties in detecting unknown classes, extracting species-specific features, and achieving robust cross-corpus recognition. To address these challenges, this letter presents a DL-based open-set cross-corpus recognition method for BVs that combines feature construction with open-set recognition (OSR) techniques. We introduce a three-channel spectrogram that integrates both amplitude and phase information to enhance feature representation. To improve the recognition accuracy of known classes across corpora, we employ a class-specific semantic reconstruction model to extract deep features. For unknown class discrimination, we propose a Dual Strategy Coupling Scoring (DSCS) mechanism, which synthesizes the log-likelihood ratio score (LLRS) and reconstruction error score (RES). Our method achieves the highest weighted accuracy among existing approaches on a public dataset, demonstrating its effectiveness for open-set cross-corpus bird vocalization recognition.

MCML Authors

Björn Schuller

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Health Informatics

[1634]

C. Liu, C. M. Albrecht, Y. Wang and X. Zhu.
CromSS: Cross-Modal Pretraining With Noisy Labels for Remote Sensing Image Segmentation.
IEEE Transactions on Geoscience and Remote Sensing 63 (Mar. 2025). DOI GitHub

Abstract

We explore the potential of large-scale noisily labeled data to enhance feature learning by pretraining semantic segmentation models within a multimodal framework for geospatial applications. We propose a novel cross-modal sample selection (CromSS) method, a weakly supervised pretraining strategy designed to improve feature representations through cross-modal consistency and noise mitigation techniques. Unlike conventional pretraining approaches, CromSS exploits massive amounts of noisy and easy-to-come-by labels for improved feature learning beneficial to semantic segmentation tasks. We investigate middle and late fusion strategies to optimize the multimodal pretraining architecture design. We also introduce a cross-modal sample selection module to mitigate the adverse effects of label noise, which employs a cross-modal entangling strategy to refine the estimated confidence masks within each modality to guide the sampling process. Additionally, we introduce a spatial–temporal label smoothing technique to counteract overconfidence for enhanced robustness against noisy labels. To validate our approach, we assembled the multimodal dataset, NoLDO-S12, which consists of a large-scale noisy label subset from Google’s Dynamic World (DW) dataset for pretraining and two downstream subsets with high-quality labels from Google DW and OpenStreetMap (OSM) for transfer learning. Experimental results on two downstream tasks and the publicly available DFC2020 dataset demonstrate that when effectively utilized, the low-cost noisy labels can significantly enhance feature learning for segmentation tasks.

MCML Authors

Chenying Liu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1633]

F. Li, Y. Bi, D. Huang, Z. Jiang and N. Navab.
Robotic CBCT Meets Robotic Ultrasound.
International Journal of Computer Assisted Radiology and Surgery (Mar. 2025). DOI

Abstract

Purpose: The multi-modality imaging system offers optimal fused images for safe and precise interventions in modern clinical practices, such as computed tomography-ultrasound (CT-US) guidance for needle insertion. However, the limited dexterity and mobility of current imaging devices hinder their integration into standardized workflows and the advancement toward fully autonomous intervention systems. In this paper, we present a novel clinical setup where robotic cone beam computed tomography (CBCT) and robotic US are pre-calibrated and dynamically co-registered, enabling new clinical applications. This setup allows registration-free rigid registration, facilitating multi-modal guided procedures in the absence of tissue deformation.
Methods: First, a one-time pre-calibration is performed between the systems. To ensure a safe insertion path by highlighting critical vasculature on the 3D CBCT, SAM2 segments vessels from B-mode images, using the Doppler signal as an autonomously generated prompt. Based on the registration, the Doppler image or segmented vessel masks are then mapped onto the CBCT, creating an optimally fused image with comprehensive detail. To validate the system, we used a specially designed phantom, featuring lesions covered by ribs and multiple vessels with simulated moving flow.
Results: The mapping error between US and CBCT resulted in an average deviation of mm. A user study demonstrated the effectiveness of CBCT-US fusion for needle insertion guidance, showing significant improvements in time efficiency, accuracy, and success rate. Needle intervention performance improved by approximately 50% compared to the conventional US-guided workflow.
Conclusion: We present the first robotic dual-modality imaging system designed to guide clinical applications. The results show significant performance improvements compared to traditional manual interventions.

MCML Authors

Feng Li

Computer Aided Medical Procedures & Augmented Reality

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[1632]

L. Zumeta-Olaskoaga, A. Bender and D.-J. Lee.
Flexible modelling of time-varying exposures and recurrent events to analyse training load effects in team sports injuries.
Journal of the Royal Statistical Society. Series C (Applied Statistics) 74.2 (Mar. 2025). DOI

Abstract

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[1631]

M. Schneble and G. Kauermann.
Statistical modelling of on-street parking spot occupancy in smart cities.
Journal of the Royal Statistical Society. Series C (Applied Statistics).qlaf017 (Mar. 2025). DOI

Abstract

Many studies suggest that searching for parking is associated with significant direct and indirect costs. Therefore, it is appealing to reduce the time that car drivers spend on finding an available parking spot, especially in urban areas where the space for all road users is limited. The prediction of on-street parking spot occupancy can provide drivers with guidance on where clear parking spaces are likely to be found. This field of research has gained more and more attention in the last decade through the increasing availability of real-time parking spot occupancy data. In this paper, we pursue a statistical approach for the prediction of parking spot occupancy, where we make use of time-to-event models and semi-Markov process theory. The latter involves the employment of Laplace transformations as well as their inversion, which is an ambitious numerical task. We apply our methodology to data from the City of Melbourne in Australia. Our main result is that the semi-Markov model outperforms a Markov model in terms of both true negative rate and true positive rate while this is essentially achieved by respecting the current duration that a parking space already spends in its initial state.

MCML Authors

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[1630]

A. Liebeskind, J. R. Schüre, M. S. Fabian, S. Weinmüller, P. Schünke, V. Golkov, D. Cremers and M. Zaiss.
The Pulseq-CEST Library: definition of preparations and simulations, example data, and example evaluations.
Magnetic Resonance Materials in Physics, Biology and Medicine (Mar. 2025). DOI

Abstract

Objectives: Despite prevalent use of chemical exchange saturation transfer (CEST) MRI, standardization remains elusive. Imaging depends heavily on parameters dictating radiofrequency (RF) events, gradients, and apparent diffusion coefficient (ADC). We present the Pulseq-CEST Library, a repository of CEST preparation and simulation definitions, including example data and evaluations, that provides a common basis for reproducible research, rapid prototyping, and in silico deep learning training data generation.
Materials and methods: A Pulseq-CEST experiment requires (i) a CEST preparation sequence, (ii) a Bloch–McConnell parameter set, (iii) a Bloch–McConnell simulation, and (iv) an evaluation script. Pulseq-CEST utilizes the Bloch–McConnell equations to model in vitro and in vivo conditions. Using this model, a candidate sequence or environment can be held constant while varying other inputs, enabling robust testing.
Results: Data were compared for amide proton transfer weighted (APTw) and water shift and B1 (WASABI) protocols using a five-tube phantom and simulated environments. Real and simulated data matched anticipated spectral shapes and local peak characteristics. The Pulseq-CEST Library supports similar experiments with common sequences and environments to assess new protocols and sample data.
Discussion: The Pulseq-CEST Library provides a flexible mechanism for standardizing and prototyping CEST sequences, facilitating collaborative development. With the capability for expansion, including open-source incorporation of new sequences and environments, the library accelerates the invention and spread of novel CEST and other saturation transfer approaches, such as relayed NOEs (rNOEs) and semisolid magnetization transfer contrast (MTC) methods.

MCML Authors

Alexander Liebeskind

* Former Member

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1629]

A. Tejada-Lapuerta, P. Bertin, S. Bauer, H. Aliee, Y. Bengio and F. J. Theis.
Causal machine learning for single-cell genomics.
Nature Genetics (Mar. 2025). DOI

Abstract

Advances in single-cell ‘-omics’ allow unprecedented insights into the transcriptional profiles of individual cells and, when combined with large-scale perturbation screens, enable measuring of the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the causative role of genes in complex biological processes. In this Perspective, we delineate the application of causal machine learning to single-cell genomics and its associated challenges. We first present the causal model that is most commonly applied to single-cell biology and then identify and discuss potential approaches to three open problems: the lack of generalization of models to novel experimental conditions, the complexity of interpreting learned models, and the difficulty of learning cell dynamics.

MCML Authors

Stefan Bauer

Prof. Dr.

Algorithmic Machine Learning & Explainable AI

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[1628]

M. E. Consens, C. Dufault, M. Wainberg, D. Forster, M. Karimzadeh, H. Goodarzi, F. J. Theis, A. Moses and B. Wang.
Transformers and genome language models.
Nature Machine Intelligence (Mar. 2025). DOI

Abstract

Large language models based on the transformer deep learning architecture have revolutionized natural language processing. Motivated by the analogy between human language and the genome’s biological code, researchers have begun to develop genome language models (gLMs) based on transformers and related architectures. This Review explores the use of transformers and language models in genomics. We survey open questions in genomics amenable to the use of gLMs, and motivate the use of gLMs and the transformer architecture for these problems. We discuss the potential of gLMs for modelling the genome using unsupervised pretraining tasks, specifically focusing on the power of zero- and few-shot learning. We explore the strengths and limitations of the transformer architecture, as well as the strengths and limitations of current gLMs more broadly. Additionally, we contemplate the future of genomic modelling beyond the transformer architecture, based on current trends in research. This Review serves as a guide for computational biologists and computer scientists interested in transformers and language models for genomic data.

MCML Authors

Fabian Theis

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Mathematical Modelling of Biological Systems

[1627]

D. Bär, N. Pröllochs and S. Feuerriegel.
The role of social media ads for election outcomes: Evidence from the 2021 German election.
PNAS Nexus.pgaf073 (Mar. 2025). DOI

Abstract

Social media ads have become a key communication channel in politics. However, the relationship between political ads from social media and election outcomes is not fully understood. Here, we aim to estimate the association between online political advertising and election outcomes during the 2021 German federal election. For this, we analyze a large-scale dataset of 21,641 political ads from Facebook and Instagram that received ≈126 million impressions. Using regression analysis, we show that political advertising on social media has a positive relationship with a candidate’s election outcome and may even sway elections. All else equal, ≈200,000 additional impressions are predicted to increase a candidate’s votes by 2.1%. We further use a causal sensitivity analysis to evaluate how unobserved confounding may affect our estimates. We find that the estimated impact of ads cannot be reasonably explained away, highlighting the significance of social media for election outcomes.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Artificial Intelligence in Management

[1626]

Q. Xu, Y. Shi, J. Zhao and X. Zhu.
FloodCastBench: A Large-Scale Dataset and Foundation Models for Flood Modeling and Forecasting.
Scientific Data 12.431 (Mar. 2025). DOI

Abstract

Effective flood forecasting is crucial for informed decision-making and emergency response. Existing flood datasets mainly describe flood events but lack dynamic process data suitable for machine learning (ML). This work introduces the FloodCastBench dataset, designed for ML-based flood modeling and forecasting, featuring four major flood events: Pakistan 2022, UK 2015, Australia 2022, and Mozambique 2019. FloodCastBench details the process of flood dynamics data acquisition, starting with input data preparation (e.g., topography, land use, rainfall) and flood measurement data collection (e.g., SAR-based maps, surveyed outlines) for hydrodynamic modeling. We deploy a widely recognized finite difference numerical solution to construct high-resolution spatiotemporal dynamic processes with 30-m spatial and 300-second temporal resolutions. Flood measurement data are used to calibrate the hydrodynamic model parameters and validate the flood inundation maps. FloodCastBench provides comprehensive low-fidelity and high-fidelity flood forecasting datasets specifically for ML. Furthermore, we establish a benchmark of foundational models for neural flood forecasting using FloodCastBench, validating its effectiveness in supporting ML models for spatiotemporal, cross-regional, and downscaled flood forecasting.

MCML Authors

Qingsong Xu

Data Science in Earth Observation

Jie Zhao

Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Data Science in Earth Observation

[1625]

R. Hornung, M. Nalenz, L. Schneider, A. Bender, L. Bothmann, F. Dumpert, B. Bischl, T. Augustin and A.-L. Boulesteix.
Evaluating Machine Learning Models in Non-Standard Settings: An Overview and New Findings.
Statistical Science (Mar. 2025). To be published. Preprint available. arXiv

Abstract

Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines for GE estimation in various such non-standard settings: clustered data, spatial data, unequal sampling probabilities, concept drift, and hierarchically structured outcomes. Our overview combines well-established methodologies with other existing methods that, to our knowledge, have not been frequently considered in these particular settings. A unifying principle among these techniques is that the test data used in each iteration of the resampling procedure should reflect the new observations to which the model will be applied, while the training data should be representative of the entire data set used to obtain the final model. Beyond providing an overview, we address literature gaps by conducting simulation studies. These studies assess the necessity of using GE-estimation methods tailored to the respective setting. Our findings corroborate the concern that standard resampling methods often yield biased GE estimates in non-standard settings, underscoring the importance of tailored GE estimation.

MCML Authors

Roman Hornung

Dr.

Biometry in Molecular Medicine

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Anne-Laure Boulesteix

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Biometry in Molecular Medicine

[1624]

C. Bülte, P. Scholl and G. Kutyniok.
Probabilistic neural operators for functional uncertainty quantification.
Transactions on Machine Learning Research (Mar. 2025). URL

Abstract

Neural operators aim to approximate the solution operator of a system of differential equations purely from data. They have shown immense success in modeling complex dynamical systems across various domains. However, the occurrence of uncertainties inherent in both model and data has so far rarely been taken into accounttextemdash{}a critical limitation in complex, chaotic systems such as weather forecasting. In this paper, we introduce the probabilistic neural operator (PNO), a framework for learning probability distributions over the output function space of neural operators. PNO extends neural operators with generative modeling based on strictly proper scoring rules, integrating uncertainty information directly into the training process. We provide a theoretical justification for the approach and demonstrate improved performance in quantifying uncertainty across different domains and with respect to different baselines. Furthermore, PNO requires minimal adjustment to existing architectures, shows improved performance for most probabilistic prediction tasks, and leads to well-calibrated predictive distributions and adequate uncertainty representations even for long dynamical trajectories. Implementing our approach into large-scale models for physical applications can lead to improvements in corresponding uncertainty quantification and extreme event identification, ultimately leading to a deeper understanding of the prediction of such surrogate models.

MCML Authors

Christopher Bülte

Mathematical Foundations of Artificial Intelligence

Philipp Scholl

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1623]

K. Schwethelm, J. Kaiser, M. Knolle, S. Lockfisch, D. Rückert and A. Ziller.
Visual Privacy Auditing with Diffusion Models.
Transactions on Machine Learning Research (Mar. 2025). URL

Abstract

Data reconstruction attacks on machine learning models pose a substantial threat to privacy, potentially leaking sensitive information. Although defending against such attacks using differential privacy (DP) provides theoretical guarantees, determining appropriate DP parameters remains challenging. Current formal guarantees on the success of data reconstruction suffer from overly stringent assumptions regarding adversary knowledge about the target data, particularly in the image domain, raising questions about their real-world applicability. In this work, we empirically investigate this discrepancy by introducing a reconstruction attack based on diffusion models (DMs) that only assumes adversary access to real-world image priors and specifically targets the DP defense. We find that (1) real-world data priors significantly influence reconstruction success, (2) current reconstruction bounds do not model the risk posed by data priors well, and (3) DMs can serve as heuristic auditing tools for visualizing privacy leakage.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1622]

P. Bertin, J. D. Viviano, A. Tejada-Lapuerta, W. Wang, S. Bauer, F. J. Theis and Y. Bengio.
A scalable gene network model of regulatory dynamics in single cells.
Preprint (Mar. 2025). arXiv

Abstract

Single-cell data provide high-dimensional measurements of the transcriptional states of cells, but extracting insights into the regulatory functions of genes, particularly identifying transcriptional mechanisms affected by biological perturbations, remains a challenge. Many perturbations induce compensatory cellular responses, making it difficult to distinguish direct from indirect effects on gene regulation. Modeling how gene regulatory functions shape the temporal dynamics of these responses is key to improving our understanding of biological perturbations. Dynamical models based on differential equations offer a principled way to capture transcriptional dynamics, but their application to single-cell data has been hindered by computational constraints, stochasticity, sparsity, and noise. Existing methods either rely on low-dimensional representations or make strong simplifying assumptions, limiting their ability to model transcriptional dynamics at scale. We introduce a Functional and Learnable model of Cell dynamicS, FLeCS, that incorporates gene network structure into coupled differential equations to model gene regulatory functions. Given (pseudo)time-series single-cell data, FLeCS accurately infers cell dynamics at scale, provides improved functional insights into transcriptional mechanisms perturbed by gene knockouts, both in myeloid differentiation and K562 Perturb-seq experiments, and simulates single-cell trajectories of A549 cells following small-molecule perturbations.

MCML Authors

Stefan Bauer

Prof. Dr.

Algorithmic Machine Learning & Explainable AI

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[1621]

C. Damke and E. Hüllermeier.
Adjusted Count Quantification Learning on Graphs.
Preprint (Mar. 2025). arXiv

Abstract

Quantification learning is the task of predicting the label distribution of a set of instances. We study this problem in the context of graph-structured data, where the instances are vertices. Previously, this problem has only been addressed via node clustering methods. In this paper, we extend the popular Adjusted Classify & Count (ACC) method to graphs. We show that the prior probability shift assumption upon which ACC relies is often not fulfilled and propose two novel graph quantification techniques: Structural importance sampling (SIS) makes ACC applicable in graph domains with covariate shift. Neighborhood-aware ACC improves quantification in the presence of non-homophilic edges. We show the effectiveness of our techniques on multiple graph quantification tasks.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Artificial Intelligence and Machine Learning

[1620]

A. Fono, M. Singh, E. Araya, P. C. Petersen, H. Boche and G. Kutyniok.
Sustainable AI: Mathematical Foundations of Spiking Neural Networks.
Preprint (Mar. 2025). arXiv

Abstract

Deep learning’s success comes with growing energy demands, raising concerns about the long-term sustainability of the field. Spiking neural networks, inspired by biological neurons, offer a promising alternative with potential computational and energy-efficiency gains. This article examines the computational properties of spiking networks through the lens of learning theory, focusing on expressivity, training, and generalization, as well as energy-efficient implementations while comparing them to artificial neural networks. By categorizing spiking models based on time representation and information encoding, we highlight their strengths, challenges, and potential as an alternative computational paradigm.

MCML Authors

Adalbert Fono

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1619]

L. Girrbach, S. Alaniz, G. Smith and Z. Akata.
A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models.
Preprint (Mar. 2025). arXiv

Abstract

With the increasing use of image generation technology, understanding its social biases, including gender bias, is essential. This paper presents the first large-scale study on gender bias in text-to-image (T2I) models, focusing on everyday situations. While previous research has examined biases in occupations, we extend this analysis to gender associations in daily activities, objects, and contexts. We create a dataset of 3,217 gender-neutral prompts and generate 200 images per prompt from five leading T2I models. We automatically detect the perceived gender of people in the generated images and filter out images with no person or multiple people of different genders, leaving 2,293,295 images. To enable a broad analysis of gender bias in T2I models, we group prompts into semantically similar concepts and calculate the proportion of male- and female-gendered images for each prompt. Our analysis shows that T2I models reinforce traditional gender roles, reflect common gender stereotypes in household roles, and underrepresent women in financial related activities. Women are predominantly portrayed in care- and human-centered scenarios, and men in technical or physical labor scenarios.

MCML Authors

Leander Girrbach

Interpretable and Reliable Machine Learning

Stephan Alaniz

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1618]

M. Hartenberger, H. Ayaz, F. Ozlugedik, C. Caredda, L. Giannoni, F. Lange, L. Lux, J. Weidner, A. Berger, F. Kofler, M. Menten, B. Montcel, I. Tachtsidis, D. Rückert and I. Ezhov.
Redefining spectral unmixing for in-vivo brain tissue analysis from hyperspectral imaging.
Preprint (Mar. 2025). arXiv

Abstract

In this paper, we propose a methodology for extracting molecular tumor biomarkers from hyperspectral imaging (HSI), an emerging technology for intraoperative tissue assessment. To achieve this, we employ spectral unmixing, allowing to decompose the spectral signals recorded by the HSI camera into their constituent molecular components. Traditional unmixing approaches are based on physical models that establish a relationship between tissue molecules and the recorded spectra. However, these methods commonly assume a linear relationship between the spectra and molecular content, which does not capture the whole complexity of light-matter interaction. To address this limitation, we introduce a novel unmixing procedure that allows to take into account non-linear optical effects while preserving the computational benefits of linear spectral unmixing. We validate our methodology on an in-vivo brain tissue HSI dataset and demonstrate that the extracted molecular information leads to superior classification performance.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Jonas Weidner

AI for Image-Guided Diagnosis and Therapy

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1617]

J. Hingerl, L. D. Martens, A. Karollus, T. Manz, J. D. Buenrostro, F. J. Theis and J. Gagneur.
scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution.
Preprint (Mar. 2025). DOI

Abstract

Understanding how regulatory DNA elements shape gene expression across individual cells is a fundamental challenge in genomics. Joint RNA-seq and epigenomic profiling provides opportunities to build unifying models of gene regulation capturing sequence determinants across steps of gene expression. However, current models, developed primarily for bulk omics data, fail to capture the cellular heterogeneity and dynamic processes revealed by single-cell multi-modal technologies. Here, we introduce scooby, the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome from sequence at single-cell resolution. For this, we leverage the pre-trained multi-omics profile predictor Borzoi as a foundation model, equip it with a cell-specific decoder, and fine-tune its sequence embeddings. Specifically, we condition the decoder on the cell position in a precomputed single-cell embedding resulting in strong generalization capability. Applied to a hematopoiesis dataset, scooby recapitulates cell-specific expression levels of held-out genes and cells, and identifies regulators and their putative target genes through in silico motif deletion. Moreover, accurate variant effect prediction with scooby allows for breaking down bulk eQTL effects into single-cell effects and delineating their impact on chromatin accessibility and gene expression. We anticipate scooby to aid unraveling the complexities of gene regulation at the resolution of individual cells.Competing Interest StatementJ.D.B. holds patents related to ATAC-seq and is an SAB member of Camp4 and seqWell. F.J.T. consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd and Omniscope Ltd, and has ownership interest in Dermagnostix GmbH and Cellarity.

MCML Authors

Johannes Hingerl

Computational Molecular Medicine

Alexander Karollus

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

[1616]

S. Kondylatos, N. Bountos, D. Michail, X. Zhu, G. Camps-Valls and I. Papoutsis.
On the Generalization of Representation Uncertainty in Earth Observation.
Preprint (Mar. 2025). arXiv GitHub

Abstract

Recent advances in Computer Vision have introduced the concept of pretrained representation uncertainty, enabling zero-shot uncertainty estimation. This holds significant potential for Earth Observation (EO), where trustworthiness is critical, yet the complexity of EO data poses challenges to uncertainty-aware methods. In this work, we investigate the generalization of representation uncertainty in EO, considering the domain’s unique semantic characteristics. We pretrain uncertainties on large EO datasets and propose an evaluation framework to assess their zero-shot performance in multi-label classification and segmentation EO tasks. Our findings reveal that, unlike uncertainties pretrained on natural images, EO-pretraining exhibits strong generalization across unseen EO domains, geographic locations, and target granularities, while maintaining sensitivity to variations in ground sampling distance. We demonstrate the practical utility of pretrained uncertainties showcasing their alignment with task-specific uncertainties in downstream tasks, their sensitivity to real-world EO image noise, and their ability to generate spatial uncertainty estimates out-of-the-box. Initiating the discussion on representation uncertainty in EO, our study provides insights into its strengths and limitations, paving the way for future research in the field.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1615]

F. Krause, T. Phan, M. Gui, S. A. Baumann, V. T. Hu and B. Ommer.
TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training.
Preprint (Mar. 2025). arXiv

Abstract

Diffusion models have emerged as the mainstream approach for visual generation. However, these models typically suffer from sample inefficiency and high training costs. Consequently, methods for efficient finetuning, inference and personalization were quickly adopted by the community. However, training these models in the first place remains very costly. While several recent approaches - including masking, distillation, and architectural modifications - have been proposed to improve training efficiency, each of these methods comes with a tradeoff: they achieve enhanced performance at the expense of increased computational cost or vice versa. In contrast, this work aims to improve training efficiency as well as generative performance at the same time through routes that act as a transport mechanism for randomly selected tokens from early layers to deeper layers of the model. Our method is not limited to the common transformer-based model - it can also be applied to state-space models and achieves this without architectural modifications or additional parameters. Finally, we show that TREAD reduces computational cost and simultaneously boosts model performance on the standard ImageNet-256 benchmark in class-conditional synthesis. Both of these benefits multiply to a convergence speedup of 14x at 400K training iterations compared to DiT and 37x compared to the best benchmark performance of DiT at 7M training iterations. Furthermore, we achieve a competitive FID of 2.09 in a guided and 3.93 in an unguided setting, which improves upon the DiT, without architectural changes.

MCML Authors

Vincent Tao Hu

Dr.

Computer Vision & Learning

Björn Ommer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Computer Vision & Learning

[1614]

F. J. D. Lange, J. C. Wilcke, S. Hoffmann, M. Herrmann and A.-L. Boulesteix.
On 'confirmatory' methodological research in statistics and related fields.
Preprint (Mar. 2025). arXiv

Abstract

Empirical substantive research, such as in the life or social sciences, is commonly categorized into the two modes exploratory and confirmatory, both of which are essential to scientific progress. The former is also referred to as hypothesis-generating or data-contingent research, the latter is also called hypothesis-testing research. In the context of empirical methodological research in statistics, however, the exploratory-confirmatory distinction has received very little attention so far. Our paper aims to fill this gap. First, we revisit the concept of empirical methodological research through the lens of the exploratory-confirmatory distinction. Secondly, we examine current practice with respect to this distinction through a literature survey including 115 articles from the field of biostatistics. Thirdly, we provide practical recommendations towards more appropriate design, interpretation, and reporting of empirical methodological research in light of this distinction. In particular, we argue that both modes of research are crucial to methodological progress, but that most published studies – even if sometimes disguised as confirmatory – are essentially of exploratory nature. We emphasize that it may be adequate to consider empirical methodological research as a continuum between ‘pure’ exploration and ‘strict’ confirmation, recommend transparently reporting the mode of conducted research within the spectrum between exploratory and confirmatory, and stress the importance of study protocols written before conducting the study, especially in confirmatory methodological research.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[1613]

J. Li, C. Liu, W. Bai, R. Arcucci, C. I. Bercea and J. A. Schnabel.
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions.
Preprint (Mar. 2025). arXiv GitHub

Abstract

Visual Language Models (VLMs) have demonstrated impressive capabilities in visual grounding tasks. However, their effectiveness in the medical domain, particularly for abnormality detection and localization within medical images, remains underexplored. A major challenge is the complex and abstract nature of medical terminology, which makes it difficult to directly associate pathological anomaly terms with their corresponding visual features. In this work, we introduce a novel approach to enhance VLM performance in medical abnormality detection and localization by leveraging decomposed medical knowledge. Instead of directly prompting models to recognize specific abnormalities, we focus on breaking down medical concepts into fundamental attributes and common visual patterns. This strategy promotes a stronger alignment between textual descriptions and visual features, improving both the recognition and localization of abnormalities in medical images. We evaluate our method on the 0.23B Florence-2 base model and demonstrate that it achieves comparable performance in abnormality grounding to significantly larger 7B LLaVA-based medical VLMs, despite being trained on only 1.5% of the data used for such models. Experimental results also demonstrate the effectiveness of our approach in both known and previously unseen abnormalities, suggesting its strong generalization capabilities.

MCML Authors

Jun Li

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1612]

Y. Li, M. Milling and B. W. Schuller.
Neuroplasticity in Artificial Intelligence -- An Overview and Inspirations on Drop In & Out Learning.
Preprint (Mar. 2025). arXiv

Abstract

Artificial Intelligence (AI) has achieved new levels of performance and spread in public usage with the rise of deep neural networks (DNNs). Initially inspired by human neurons and their connections, NNs have become the foundation of AI models for many advanced architectures. However, some of the most integral processes in the human brain, particularly neurogenesis and neuroplasticity in addition to the more spread neuroapoptosis have largely been ignored in DNN architecture design. Instead, contemporary AI development predominantly focuses on constructing advanced frameworks, such as large language models, which retain a static structure of neural connections during training and inference. In this light, we explore how neurogenesis, neuroapoptosis, and neuroplasticity can inspire future AI advances. Specifically, we examine analogous activities in artificial NNs, introducing the concepts of dropin'' for neurogenesis and revisiting dropout’’ and structural pruning for neuroapoptosis. We additionally suggest neuroplasticity combining the two for future large NNs in ``life-long learning’’ settings following the biological inspiration. We conclude by advocating for greater research efforts in this interdisciplinary domain and identifying promising directions for future exploration.

MCML Authors

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1611]

Y. Li, Q. Sun, S. M. K. Murthy, E. Alturki and B. W. Schuller.
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations.
Preprint (Mar. 2025). arXiv

Abstract

Affective Computing (AC) is essential for advancing Artificial General Intelligence (AGI), with emotion recognition serving as a key component. However, human emotions are inherently dynamic, influenced not only by an individual’s expressions but also by interactions with others, and single-modality approaches often fail to capture their full dynamics. Multimodal Emotion Recognition (MER) leverages multiple signals but traditionally relies on utterance-level analysis, overlooking the dynamic nature of emotions in conversations. Emotion Recognition in Conversation (ERC) addresses this limitation, yet existing methods struggle to align multimodal features and explain why emotions evolve within dialogues. To bridge this gap, we propose GatedxLSTM, a novel speech-text multimodal ERC model that explicitly considers voice and transcripts of both the speaker and their conversational partner(s) to identify the most influential sentences driving emotional shifts. By integrating Contrastive Language-Audio Pretraining (CLAP) for improved cross-modal alignment and employing a gating mechanism to emphasise emotionally impactful utterances, GatedxLSTM enhances both interpretability and performance. Additionally, the Dialogical Emotion Decoder (DED) refines emotion predictions by modelling contextual dependencies. Experiments on the IEMOCAP dataset demonstrate that GatedxLSTM achieves state-of-the-art (SOTA) performance among open-source methods in four-class emotion classification. These results validate its effectiveness for ERC applications and provide an interpretability analysis from a psychological perspective.

MCML Authors

Björn Schuller

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Health Informatics

[1610]

M. M. Mandl, F. Weber, T. Wöhrle and A.-L. Boulesteix.
The impact of the storytelling fallacy on real data examples in methodological research.
Preprint (Mar. 2025). arXiv

Abstract

The term ‘researcher degrees of freedom’ (RDF), which was introduced in metascientific literature in the context of the replication crisis in science, refers to the extent of flexibility a scientist has in making decisions related to data analysis. These choices occur at all stages of the data analysis process. In combination with selective reporting, RDF may lead to over-optimistic statements and an increased rate of false positive findings. Even though the concept has been mainly discussed in fields such as epidemiology or psychology, similar problems affect methodological statistical research. Researchers who develop and evaluate statistical methods are left with a multitude of decisions when designing their comparison studies. This leaves room for an over-optimistic representation of the performance of their preferred method(s). The present paper defines and explores a particular RDF that has not been previously identified and discussed. When interpreting the results of real data examples that are most often part of methodological evaluations, authors typically tell a domain-specific ‘story’ that best supports their argumentation in favor of their preferred method. However, there are often plenty of other plausible stories that would support different conclusions. We define the ‘storytelling fallacy’ as the selective use of anecdotal domain-specific knowledge to support the superiority of specific methods in real data examples. While such examples fed by domain knowledge play a vital role in methodological research, if deployed inappropriately they can also harm the validity of conclusions on the investigated methods. The goal of our work is to create awareness for this issue, fuel discussions on the role of real data in generating evidence in methodological research and warn readers of methodological literature against naive interpretations of real data examples.

MCML Authors

Maximilian Mandl

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[1609]

M. L. Mostafa, A. Alperovich, D. Fedotov, G. Ghazaei, S. Saur, A. Farshad and N. Navab.
Surgical Flow Masked Autoencoder for Event Recognition.
Preprint (Mar. 2025).

Abstract

Recognition and forecasting of surgical events from video sequences are crucial for advancing computer-assisted surgery. Surgical events are often characterized by specific tool-tissue interactions; for example, ”bleeding damage” occurs when a tool unintentionally cuts a tissue, leading to blood flow. Despite progress in general event classification, recognizing and forecasting events in medical contexts remains challenging due to data scarcity and the complexity of these events. To address these challenges, we propose a method utilizing video masked autoencoders (VideoMAE) for surgical event recognition. This approach focuses the network on the most informative areas of the video while minimizing the need for extensive annotations. We introduce a novel mask sampling technique based on an estimated prior probability map derived from optical flow. We hypothesize that leveraging prior knowledge of tool-tissue interactions will enable the network to concentrate on the most relevant regions in the video. We propose two methods for estimating the prior probability map: (a) retaining areas with the fastest motion and (b) incorporating an additional encoding pathway for optical flow. Our extensive experiments on the public dataset CATARACTS and our in-house neurosurgical data demonstrate that optical flow-based masking consistently outperforms random masking strategies of VideoMAE in phase and event classification tasks. We find that an optical flow encoder enhances classification accuracy by directing the network’s focus to the most relevant information, even in regions without rapid motion. Finally, we investigate sequential and multi-task training strategies to identify the best-performing model, which surpasses the current state-of-the-art by 5% on the CATARACTS dataset and 27% on our in-house neurosurgical data.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Computer Aided Medical Procedures & Augmented Reality

[1608]

I. Obadic, D. Kangin, D. Oliveira, P. Angelov and X. Zhu.
i-WiViG: Interpretable Window Vision GNN.
Preprint (Mar. 2025). arXiv

Abstract

Deep learning models based on graph neural networks have emerged as a popular approach for solving computer vision problems. They encode the image into a graph structure and can be beneficial for efficiently capturing the long-range dependencies typically present in remote sensing imagery. However, an important drawback of these methods is their black-box nature which may hamper their wider usage in critical applications. In this work, we tackle the self-interpretability of the graph-based vision models by proposing our Interpretable Window Vision GNN (i-WiViG) approach, which provides explanations by automatically identifying the relevant subgraphs for the model prediction. This is achieved with window-based image graph processing that constrains the node receptive field to a local image region and by using a self-interpretable graph bottleneck that ranks the importance of the long-range relations between the image regions. We evaluate our approach to remote sensing classification and regression tasks, showing it achieves competitive performance while providing inherent and faithful explanations through the identified relations. Further, the quantitative evaluation reveals that our model reduces the infidelity of post-hoc explanations compared to other Vision GNN models, without sacrificing explanation sparsity.

MCML Authors

Ivica Obadic

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1607]

R. D. Paul, J. Seiffarth, D. Rügamer, H. Scharr and K. Nöh.
How To Make Your Cell Tracker Say 'I dunno!'.
Preprint (Mar. 2025). arXiv

Abstract

Cell tracking is a key computational task in live-cell microscopy, but fully automated analysis of high-throughput imaging requires reliable and, thus, uncertainty-aware data analysis tools, as the amount of data recorded within a single experiment exceeds what humans are able to overlook. We here propose and benchmark various methods to reason about and quantify uncertainty in linear assignment-based cell tracking algorithms. Our methods take inspiration from statistics and machine learning, leveraging two perspectives on the cell tracking problem explored throughout this work: Considering it as a Bayesian inference problem and as a classification problem. Our methods admit a framework-like character in that they equip any frame-to-frame tracking method with uncertainty quantification. We demonstrate this by applying it to various existing tracking algorithms including the recently presented Transformer-based trackers. We demonstrate empirically that our methods yield useful and well-calibrated tracking uncertainties.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistics, Data Science and Machine Learning

[1606]

R. Rehms, N. Ellenbach, V. Deffner and S. Hoffmann.
Addressing complex structures of measurement error arising in the exposure assessment in occupational epidemiology using a Bayesian hierarchical approach.
Preprint (Mar. 2025). arXiv

Abstract

Exposure assessment in occupational epidemiology may involve multiple unknown quantities that are measured or reconstructed simultaneously for groups of workers and over several years. Additionally, exposures may be collected using different assessment strategies, depending on the period of exposure. As a consequence, researchers who are analyzing occupational cohort studies are commonly faced with challenging structures of exposure measurement error, involving complex dependence structures and multiple measurement error models, depending on the period of exposure. However, previous work has often made many simplifying assumptions concerning these errors. In this work, we propose a Bayesian hierarchical approach to account for a broad range of error structures arising in occupational epidemiology. The considered error structures may involve several unknown quantities that can be subject to mixtures of Berkson and classical measurement error. It is possible to account for different error structures, depending on the exposure period and the location of a worker. Moreover, errors can present complex dependence structures over time and between workers. We illustrate the proposed hierarchical approach on a subgroup of the German cohort of uranium miners to account for potential exposure uncertainties in the association between radon exposure and lung cancer mortality. The performance of the proposed approach and its sensitivity to model misspecification are evaluated in a simulation study. The results show that biases in estimates arising from very complex measurement errors can be corrected through the proposed Bayesian hierarchical approach.

MCML Authors

Nicole Ellenbach

Biometry in Molecular Medicine

[1605]

A. Scagliotti, F. Scagliotti, L. Locati and F. Sottotetti.
Ensemble optimal control for managing drug resistance in cancer therapies.
Preprint (Mar. 2025). arXiv

Abstract

In this paper, we explore the application of ensemble optimal control to derive enhanced strategies for pharmacological cancer treatment. In particular, we focus on moving beyond the classical clinical approach of giving the patient the maximal tolerated drug dose (MTD), which does not properly exploit the fight among sensitive and resistant cells for the available resources. Here, we employ a Lotka-Volterra model to describe the two competing subpopulations, and we enclose this system within the ensemble control framework. In the first part, we establish general results suitable for application to various solid cancers. Then, we carry out numerical simulations in the setting of prostate cancer treated with androgen deprivation therapy, yielding a computed policy that is reminiscent of the medical ‘active surveillance’ paradigm. Finally, inspired by the numerical evidence, we propose a variant of the celebrated adaptive therapy (AT), which we call ‘Off-On’ AT.

MCML Authors

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[1604]

H. Shang, H. Wu, G. Zhai, B. Sun, F. Wang, F. Tombari and M. Pollefeys.
SG-Tailor: Inter-Object Commonsense Relationship Reasoning for Scene Graph Manipulation.
Preprint (Mar. 2025). arXiv

Abstract

Scene graphs capture complex relationships among objects, serving as strong priors for content generation and manipulation. Yet, reasonably manipulating scene graphs – whether by adding nodes or modifying edges – remains a challenging and untouched task. Tasks such as adding a node to the graph or reasoning about a node’s relationships with all others are computationally intractable, as even a single edge modification can trigger conflicts due to the intricate interdependencies within the graph. To address these challenges, we introduce SG-Tailor, an autoregressive model that predicts the conflict-free relationship between any two nodes. SG-Tailor not only infers inter-object relationships, including generating commonsense edges for newly added nodes but also resolves conflicts arising from edge modifications to produce coherent, manipulated graphs for downstream tasks. For node addition, the model queries the target node and other nodes from the graph to predict the appropriate relationships. For edge modification, SG-Tailor employs a Cut-And-Stitch strategy to solve the conflicts and globally adjust the graph. Extensive experiments demonstrate that SG-Tailor outperforms competing methods by a large margin and can be seamlessly integrated as a plug-in module for scene generation and robotic manipulation tasks.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

[1603]

Z. Shi, X. Zhang, Y. Xia, Y. Zang, S. Shen and C. Wang.
L2RSI: Cross-view LiDAR-based Place Recognition for Large-scale Urban Scenes via Remote Sensing Imagery.
Preprint (Mar. 2025). arXiv GitHub

Abstract

We tackle the challenge of LiDAR-based place recognition, which traditionally depends on costly and time-consuming prior 3D maps. To overcome this, we first construct XA-L&RSI dataset, which encompasses approximately 110,000 remote sensing submaps and 13,000 LiDAR point cloud submaps captured in urban scenes, and propose a novel method, L2RSI, for cross-view LiDAR place recognition using high-resolution Remote Sensing Imagery. This approach enables large-scale localization capabilities at a reduced cost by leveraging readily available overhead images as map proxies. L2RSI addresses the dual challenges of cross-view and cross-modal place recognition by learning feature alignment between point cloud submaps and remote sensing submaps in the semantic domain. Additionally, we introduce a novel probability propagation method based on a dynamic Gaussian mixture model to refine position predictions, effectively leveraging temporal and spatial information. This approach enables large-scale retrieval and cross-scene generalization without fine-tuning. Extensive experiments on XA-L&RSI demonstrate that, within a 100km2 retrieval range, L2RSI accurately localizes 95.08% of point cloud submaps within a 30m radius for top-1 retrieved location. We provide a video to more vividly display the place recognition results of L2RSI at this https URL.

MCML Authors

Yan Xia

Dr.

* Former Member

[1602]

J. Shin, A. Khatri, M. A. Hedderich, A. Lucero and A. Oulasvirta.
Facilitating Asynchronous Idea Generation and Selection with Chatbots.
Preprint (Mar. 2025). arXiv

Abstract

People can generate high-quality ideas by building on each other’s ideas. By enabling individuals to contribute their ideas at their own comfortable time and method (i.e., asynchronous ideation), they can deeply engage in ideation and improve idea quality. However, running asynchronous ideation faces a practical constraint. Whereas trained human facilitators are needed to guide effective idea exchange, they cannot be continuously available to engage with individuals joining at varying hours. In this paper, we ask how chatbots can be designed to facilitate asynchronous ideation. For this, we adopted the guidelines found in the literature about human facilitators and designed two chatbots: one provides a structured ideation process, and another adapts the ideation process to individuals’ ideation performance. We invited 48 participants to generate and select ideas by interacting with one of our chatbots and invited an expert facilitator to review our chatbots. We found that both chatbots can guide users to build on each other’s ideas and converge them into a few satisfying ideas. However, we also found the chatbots’ limitations in social interaction with collaborators, which only human facilitators can provide. Accordingly, we conclude that chatbots can be promising facilitators of asynchronous ideation, but hybrid facilitation with human facilitators would be needed to address the social aspects of collaborative ideation.

MCML Authors

Michael Hedderich

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[1601]

S. Si, X. Wang, G. Zhai, N. Navab and B. Plank.
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior.
Preprint (Mar. 2025). arXiv

Abstract

Recent advancements in large language models (LLMs) have demonstrated that fine-tuning and human alignment can render LLMs harmless. In practice, such ‘harmlessness’ behavior is mainly achieved by training models to reject harmful requests, such as ‘Explain how to burn down my neighbor’s house’, where the model appropriately declines to respond. However, this approach can inadvertently result in false refusal, where models reject benign queries as well, such as ‘Tell me how to kill a Python process’. In this work, we demonstrate that prompting safety reflection before generating a response can mitigate false refusal behavior. Building on this finding, we introduce the Think-Before-Refusal (TBR) schema and conduct safety-aware instruction fine-tuning incorporating safety reflection. In an ablation study across 15 pre-trained models, we show that models fine-tuned with safety reflection significantly reduce false refusal behavior while maintaining safety and overall performance compared to those fine-tuned without safety reflection.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1600]

V. Sideri-Lampretsa, D. Rückert and H. Qiu.
Evaluation of Alignment-Regularity Characteristics in Deformable Image Registration.
Preprint (Mar. 2025). arXiv

Abstract

Evaluating deformable image registration (DIR) is challenging due to the inherent trade-off between achieving high alignment accuracy and maintaining deformation regularity. In this work, we introduce a novel evaluation scheme based on the alignment-regularity characteristic (ARC) to systematically capture and analyze this trade-off. We first introduce the ARC curves, which describe the performance of a given registration algorithm as a spectrum measured by alignment and regularity metrics. We further adopt a HyperNetwork-based approach that learns to continuously interpolate across the full regularization range, accelerating the construction and improving the sample density of ARC curves. We empirically demonstrate our evaluation scheme using representative learning-based deformable image registration methods with various network architectures and transformation models on two public datasets. We present a range of findings not evident from existing evaluation practices and provide general recommendations for model evaluation and selection using our evaluation scheme. All code relevant is made publicly available.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1599]

P. Spitzer, D. Hendriks, J. Rudolph, S. Schläger, J. Ricke, N. Kühl, B. F. Hoppe and S. Feuerriegel.
The effect of medical explanations from large language models on diagnostic decisions in radiology.
Preprint (Mar. 2025). DOI

Abstract

Large language models (LLMs) are increasingly used by physicians for diagnostic support. A key advantage of LLMs is the ability to generate explanations that can help physicians understand the reasoning behind a diagnosis. However, the best-suited format for LLM-generated explanations remains unclear. In this large-scale study, we examined the effect of different formats for LLM explanations on clinical decision-making. For this, we conducted a randomized experiment with radiologists reviewing patient cases with radiological images (N=2020 assessments). Participants received either no LLM support (control group) or were supported by one of three LLM-generated explanations: (1) a standard output providing the diagnosis without explanation; (2) a differential diagnosis comparing multiple possible diagnoses; or (3) a chain-of-thought explanation offering a detailed reasoning process for the diagnosis. We find that the format of explanations significantly influences diagnostic accuracy. The chain-of-thought explanations yielded the best performance, improving the diagnostic accuracy by 12.2% compared to the control condition without LLM support (P=0.001). The chain-of-thought explanations are also superior to the standard output without explanation (+7.2%; P=0.040) and the differential diagnosis format (+9.7%; P=0.004). Evidently, explaining the reasoning for a diagnosis helps physicians to identify and correct potential errors in LLM predictions and thus improve overall decisions. Altogether, the results highlight the importance of how explanations in medical LLMs are generated to maximize their utility in clinical practice. By designing explanations to support the reasoning processes of physicians, LLMs can improve diagnostic performance and, ultimately, patient outcomes.

MCML Authors

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1598]

P. Stangel, D. Bani-Harouni, C. Pellegrini, E. Özsoy, K. Zaripova, M. Keicher and N. Navab.
Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models.
Preprint (Mar. 2025). arXiv

Abstract

A safe and trustworthy use of Large Language Models (LLMs) requires an accurate expression of confidence in their answers. We introduce a novel Reinforcement Learning (RL) approach for LLM calibration that fine-tunes LLMs to elicit calibrated confidence estimations in their answers to factual questions. We model the problem as a betting game where the model predicts a confidence score together with every answer, and design a reward function that penalizes both over and under-confidence. We prove that under our reward design an optimal policy would result in a perfectly calibrated confidence estimation. Our experiments demonstrate significantly improved confidence calibration and generalization to new tasks without re-training, indicating that our approach teaches a general confidence awareness. This approach enables the training of inherently calibrated LLMs.

MCML Authors

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Kamilia Zaripova

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1597]

N. P. A. Vu, A. Saroha, O. Litany and D. Cremers.
GAS-NeRF: Geometry-Aware Stylization of Dynamic Radiance Fields.
Preprint (Mar. 2025). arXiv

Abstract

Current 3D stylization techniques primarily focus on static scenes, while our world is inherently dynamic, filled with moving objects and changing environments. Existing style transfer methods primarily target appearance – such as color and texture transformation – but often neglect the geometric characteristics of the style image, which are crucial for achieving a complete and coherent stylization effect. To overcome these shortcomings, we propose GAS-NeRF, a novel approach for joint appearance and geometry stylization in dynamic Radiance Fields. Our method leverages depth maps to extract and transfer geometric details into the radiance field, followed by appearance transfer. Experimental results on synthetic and real-world datasets demonstrate that our approach significantly enhances the stylization quality while maintaining temporal coherence in dynamic scenes.

MCML Authors

Abhishek Saroha

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Computer Vision & Artificial Intelligence

[1596]

Y. Wang, Z. Xiong, C. Liu, A. J. Stewart, T. Dujardin, N. I. Bountos, A. Zavras, F. Gerken, I. Papoutsis, L. Leal-Taixé and X. Zhu.
Towards a Unified Copernicus Foundation Model for Earth Vision.
Preprint (Mar. 2025). arXiv GitHub

Abstract

Advances in Earth observation (EO) foundation models have unlocked the potential of big satellite data to learn generic representations from space, benefiting a wide range of downstream applications crucial to our planet. However, most existing efforts remain limited to fixed spectral sensors, focus solely on the Earth’s surface, and overlook valuable metadata beyond imagery. In this work, we take a step towards next-generation EO foundation models with three key components: 1) Copernicus-Pretrain, a massive-scale pretraining dataset that integrates 18.7M aligned images from all major Copernicus Sentinel missions, spanning from the Earth’s surface to its atmosphere; 2) Copernicus-FM, a unified foundation model capable of processing any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding; and 3) Copernicus-Bench, a systematic evaluation benchmark with 15 hierarchical downstream tasks ranging from preprocessing to specialized applications for each Sentinel mission. Our dataset, model, and benchmark greatly improve the scalability, versatility, and multimodal adaptability of EO foundation models, while also creating new opportunities to connect EO, weather, and climate research.

MCML Authors

Chenying Liu

Data Science in Earth Observation

Franziska Gerken

Computer Vision & Artificial Intelligence

Laura Leal-Taixé

Prof. Dr.

* Former Principal Investigator

Xiaoxiang Zhu

Prof. Dr.

C1 | Medicine
→ Group Martin Menten

Data Science in Earth Observation

[1595]

A. Weers, A. H. Berger, L. Lux, P. Schüffler, D. Rückert and J. C. Paetzold.
From Pixels to Histopathology: A Graph-Based Framework for Interpretable Whole Slide Image Analysis.
Preprint (Mar. 2025). arXiv GitHub

Abstract

The histopathological classification of whole-slide images (WSIs) is a fundamental task in digital pathology; yet it requires extensive time and expertise from specialists. While deep learning methods show promising results, they typically process WSIs by dividing them into artificial patches, which inherently prevents a network from learning from the entire image context, disregards natural tissue structures and compromises interpretability. Our method overcomes this limitation through a novel graph-based framework that constructs WSI graph representations. The WSI-graph efficiently captures essential histopathological information in a compact form. We build tissue representations (nodes) that follow biological boundaries rather than arbitrary patches all while providing interpretable features for explainability. Through adaptive graph coarsening guided by learned embeddings, we progressively merge regions while maintaining discriminative local features and enabling efficient global information exchange. In our method’s final step, we solve the diagnostic task through a graph attention network. We empirically demonstrate strong performance on multiple challenging tasks such as cancer stage classification and survival prediction, while also identifying predictive factors using Integrated Gradients.

MCML Authors

Alexander Weers

Artificial Intelligence in Healthcare and Medicine

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1594]

R. Amoroso, G. Zhang, R. Koner, L. Baraldi, R. Cucchiara and V. Tresp.
Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Video Question Answering (Video QA) is a challenging video understanding task that requires models to comprehend entire videos, identify the most relevant information based on contextual cues from a given question, and reason accurately to provide answers. Recent advancements in Multimodal Large Language Models (MLLMs) have transformed video QA by leveraging their exceptional commonsense reasoning capabilities. This progress is largely driven by the effective alignment between visual data and the language space of MLLMs. However, for video QA, an additional space-time alignment poses a considerable challenge for extracting question-relevant information across frames. In this work, we investigate diverse temporal modeling techniques to integrate with MLLMs, aiming to achieve question-guided temporal modeling that leverages pre-trained visual and textual alignment in MLLMs. We propose T-Former, a novel temporal modeling method that creates a question-guided temporal bridge between frame-wise visual perception and the reasoning capabilities of LLMs. Our evaluation across multiple video QA benchmarks demonstrates that T-Former competes favorably with existing temporal modeling approaches and aligns with recent advancements in video QA.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining

Rajat Koner

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1593]

A. H. Berger, L. Lux, S. Shit, I. Ezhov, G. Kaissis, M. Menten, D. Rückert and J. C. Paetzold.
Cross-Domain and Cross-Dimension Learning for Image-to-Graph Transformers.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model. Due to this task’s complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging. This data sparsity necessitates transfer learning strategies akin to the state-of-the-art in general computer vision. In this work, we introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers. We propose (1) a regularized edge sampling loss to effectively learn object relations in multiple domains with different numbers of edges, (2) a domain adaptation framework for image-to-graph transformers aligning image- and graph-level features from different domains, and (3) a projection function that allows using 2D data for training 3D transformers. We demonstrate our method’s utility in cross-domain and cross-dimension experiments, where we utilize labeled data from 2D road networks for simultaneous learning in vastly different target domains. Our method consistently outperforms standard transfer learning and self-supervised pretraining on challenging benchmarks, such as retinal or whole-brain vessel graph extraction.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Georgios Kaissis

Dr.

* Former Member

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1592]

S. Chen, Z. Han, B. He, J. Liu, M. Buckley, Y. Qin, P. Torr, V. Tresp and J. Gu.
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI URL

Abstract

Large Language Models (LLMs) with in-context learning (ICL) ability can quickly adapt to a specific context given a few demonstrations (demos). Recently, Multimodal Large Language Models (MLLMs) built upon LLMs have also shown multimodal ICL ability, i.e., responding to queries given a few multimodal demos, including images, queries, and answers. While ICL has been extensively studied on LLMs, its research on MLLMs remains limited. One essential question is whether these MLLMs can truly conduct multimodal ICL, or if only the textual modality is necessary. We investigate this question by examining two primary factors that influence ICL: 1) Demo content, i.e., understanding the influences of demo content in different modalities. 2) Demo selection strategy, i.e., how to select better multimodal demos for improved performance. Experiments revealed that multimodal ICL is predominantly driven by the textual content whereas the visual information in the demos has little influence. Interestingly, visual content is still necessary and useful for selecting demos to increase performance. Motivated by our analysis, we propose a simple yet effective approach, termed Mixed Modality In-Context Example Selection (MMICES), which considers both visual and language modalities when selecting demos. Extensive experiments are conducted to support our findings and verify the improvement brought by our method.

MCML Authors

Shuo Chen

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1591]

F. Fundel, J. Schusterbauer, V. T. Hu and B. Ommer.
Distillation of Diffusion Features for Semantic Correspondence.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Semantic correspondence, the task of determining relationships between different parts of images, underpins various applications including 3D reconstruction, image-to-image translation, object tracking, and visual place recognition. Recent studies have begun to explore representations learned in large generative image models for semantic correspondence, demonstrating promising results. Building on this progress, current state-of-the-art methods rely on combining multiple large models, resulting in high computational demands and reduced efficiency. In this work, we address this challenge by proposing a more computationally efficient approach. We propose a novel knowledge distillation technique to overcome the problem of reduced efficiency. We show how to use two large vision foundation models and distill the capabilities of these complementary models into one smaller model that maintains high accuracy at reduced computational cost. Furthermore, we demonstrate that by incorporating 3D data, we are able to further improve performance, without the need for human-annotated correspondences. Overall, our empirical results demonstrate that our distilled model with 3D data augmentation achieves performance superior to current state-of-the-art methods while significantly reducing computational load and enhancing practicality for real-world applications, such as semantic video correspondence. Our code and weights are publicly available on our project page.

MCML Authors

Johannes Schusterbauer

Computer Vision & Learning

Vincent Tao Hu

Dr.

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1590]

Y. Li, M. Ghahremani, Y. Wally and C. Wachinger.
DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Diagnosing dementia, particularly for Alzheimer’s Disease (AD) and frontotemporal dementia (FTD), is complex due to overlapping symptoms. While magnetic resonance imaging (MRI) and positron emission tomography (PET) data are critical for the diagnosis, integrating these modalities in deep learning faces challenges, often resulting in suboptimal performance compared to using single modalities. Moreover, the potential of multi-modal approaches in differential diagnosis, which holds significant clinical importance, remains largely unexplored. We propose a novel framework, DiaMond, to address these issues with vision Transformers to effectively integrate MRI and PET. DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance. DiaMond significantly outperforms existing multi-modal methods across various datasets, achieving a balanced accuracy of 92.4% in AD diagnosis, 65.2% for AD-MCI-CN classification, and 76.5% in differential diagnosis of AD and FTD. We also validated the robustness of DiaMond in a comprehensive ablation study.

MCML Authors

Yitong Li

Artificial Intelligence in Medical Imaging

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1589]

O. Wysocki, Y. Tan, T. Froech, Y. Xia, M. Wysocki, L. Hoegner, D. Cremers and C. Holst.
ZAHA: Introducing the Level of Facade Generalization and the Large-Scale Point Cloud Facade Semantic Segmentation Benchmark Dataset.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI GitHub

Abstract

Facade semantic segmentation is a long-standing challenge in photogrammetry and computer vision. Although the last decades have witnessed the influx of facade segmentation methods, there is a lack of comprehensive facade classes and data covering the architectural variability. In ZAHA11Project page: https://github.com/OloOcki/zaha, we introduce Level of Facade Generalization (LoFG), novel hierarchical facade classes designed based on international urban modeling standards, ensuring compatibility with real-world challenging classes and uniform methods’ comparison. Realizing the LoFG, we present to date the largest semantic 3D facade segmentation dataset, providing 601 million annotated points at five and 15 classes of LoFG2 and LoFG3, respectively. More-over, we analyze the performance of baseline semantic segmentation methods on our introduced LoFG classes and data, complementing it with a discussion on the unresolved challenges for facade segmentation. We firmly believe that ZAHA shall facilitate further development of 3D facade semantic segmentation methods, enabling robust segmentation indispensable in creating urban digital twins.

MCML Authors

Yan Xia

Dr.

* Former Member

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1588]

Y. Zhang, H. Chen, A. Frikha, Y. Yang, D. Krompass, G. Zhang, J. Gu and V. Tresp.
CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Visual Question Answering (VQA) systems witnessed a significant advance in recent years due to the development of large-scale Vision-Language Pre-trained Models (VLPMs). As the application scenario and user demand change over time, an advanced VQA system is expected to be capable of continuously expanding its knowledge and capabilities over time, not only to handle new tasks (i.e., new question types or visual scenes) but also to answer questions in new specialized domains without forgetting previously acquired knowledge and skills. Existing works studying CL on VQA tasks primarily consider answer- and question-type incremental learning or scene- and function-incremental learning, whereas how VQA systems perform when they encounter new domains and increasing user demands has not been studied. Motivated by this, we introduce CL-CrossVQA, a rigorous Continual Learning benchmark for Cross-domain Visual Question Answering, through which we conduct extensive experiments on 4 VLPMs, 5 CL approaches, and 5 VQA datasets from different domains. In addition, by probing the forgetting phenomenon of the intermediate layers, we provide insights into how model architecture affects CL performance, why CL approaches can help mitigate forgetting in VLPMs, and how to design CL approaches suitable for VLPMs in this challenging continual learning environment. To facilitate future work on developing an advanced All-in-One VQA system, we will release our datasets and code.

MCML Authors

Yao Zhang

Database Systems and Data Mining

Haokun Chen

Database Systems and Data Mining

Ahmed Frikha

Dr.

* Former Member

Gengyuan Zhang

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1587]

F. Hofherr, B. Haefner and D. Cremers.
On Neural BRDFs: A Thorough Comparison of State-of-the-Art Approaches.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. Oral Presentation. DOI

Abstract

The bidirectional reflectance distribution function (BRDF) is an essential tool to capture the complex interaction of light and matter. Recently, several works have employed neural methods for BRDF modeling, following various strategies, ranging from utilizing existing parametric models to purely neural parametrizations. While all methods yield impressive results, a comprehensive comparison of the different approaches is missing in the literature. In this work, we present a thorough evaluation of several approaches, including results for qualitative and quantitative reconstruction quality and an analysis of reciprocity and energy conservation. Moreover, we propose two extensions that can be added to existing approaches: A novel additive combination strategy for neural BRDFs that split the reflectance into a diffuse and a specular part, and an input mapping that ensures reciprocity exactly by construction, while previous approaches only ensure it by soft constraints.

MCML Authors

Florian Hofherr

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1586]

H. Chen, D. Krompass, J. Gu and V. Tresp.
FedPop: Federated Population-based Hyperparameter Tuning.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their ’training-after-tuning’ framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based evolutionary algorithms to optimize the HPs, which accommodates various HP types at both the client and server sides. Compared with prior tuning methods, FedPop employs an online ’tuning-while-training’ framework, offering computational efficiency and enabling the exploration of a broader HP search space. Our empirical validation on the common FL benchmarks and complex real-world FL datasets, including full-sized Non-IID ImageNet-1K, demonstrates the effectiveness of the proposed method, which substantially outperforms the concurrent state-of-the-art HP-tuning methods in FL.

MCML Authors

Haokun Chen

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1585]

A. Davtyan, S. Sameni, B. Ommer and P. Favaro.
CAGE: Unsupervised Visual Composition and Animation for Controllable Video Generation.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI GitHub

Abstract

In this work we propose a novel method for unsupervised controllable video generation. Once trained on a dataset of unannotated videos, at inference our model is capable of both composing scenes of predefined object parts and animating them in a plausible and controlled way. This is achieved by conditioning video generation on a randomly selected subset of local pre-trained self-supervised features during training. We call our model CAGE for visual Composition and Animation for video GEneration. We conduct a series of experiments to demonstrate capabilities of CAGE in various settings.

MCML Authors

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1584]

X. Feng, Z. Jiang, T. Kaufmann, P. Xu, E. Hüllermeier, P. Weng and Y. Zhu.
DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Defining a reward function is usually a challenging but critical task for the system designer in reinforcement learning, especially when specifying complex behaviors. Reinforcement learning from human feedback (RLHF) emerges as a promising approach to circumvent this. In RLHF, the agent typically learns a reward function by querying a human teacher using pairwise comparisons of trajectory segments. A key question in this domain is how to reduce the number of queries necessary to learn an informative reward function since asking a human teacher too many queries is impractical and costly. To tackle this question, we propose DUO, a novel method for diverse, uncertain, on-policy query generation and selection in RLHF. Our method produces queries that are (1) more relevant for policy training (via an on-policy criterion), (2) more informative (via a principled measure of epistemic uncertainty), and (3) diverse (via a clustering-based filter). Experimental results on a variety of locomotion and robotic manipulation tasks demonstrate that our method can outperform state-of-the-art RLHF methods given the same total budget of queries, while being robust to possibly irrational teachers.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1583]

J. Lan, D. Frassinelli and B. Plank.
Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies between Model Predictions and Human Responses in VQA.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Large vision-language models struggle to accurately predict responses provided by multiple human annotators, particularly when those responses exhibit high uncertainty. In this study, we focus on a Visual Question Answering (VQA) task and comprehensively evaluate how well the output of the state-of-the-art vision-language model correlates with the distribution of human responses. To do so, we categorize our samples based on their levels (low, medium, high) of human uncertainty in disagreement (HUD) and employ, not only accuracy, but also three new human-correlated metrics for the first time in VQA, to investigate the impact of HUD. We also verify the effect of common calibration and human calibration (Baan et al. 2022) on the alignment of models and humans. Our results show that even BEiT3, currently the best model for this task, struggles to capture the multi-label distribution inherent in diverse human responses. Additionally, we observe that the commonly used accuracy-oriented calibration technique adversely affects BEiT3’s ability to capture HUD, further widening the gap between model predictions and human distributions. In contrast, we show the benefits of calibrating models towards human distributions for VQA, to better align model confidence with human uncertainty. Our findings highlight that for VQA, the alignment between human responses and model predictions is understudied and is an important target for future studies.

MCML Authors

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1582]

Z. Li, S. S. Cranganore, N. Youngblut and N. Kilbertus.
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Leveraging the vast genetic diversity within microbiomes offers unparalleled insights into complex phenotypes, yet the task of accurately predicting and understanding such traits from genomic data remains challenging. We propose a framework taking advantage of existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences. Based on our model, we develop attribution techniques to elucidate gene interaction effects that drive microbial adaptation to diverse environments. We train and validate our approach on a large dataset of high quality microbiome genomes from different habitats. We not only demonstrate solid predictive performance, but also how sequence-level information of entire genomes allows us to identify gene associations underlying complex phenotypes. Our attribution recovers known important interaction networks and proposes new candidates for experimental follow up.

MCML Authors

Zhufeng Li

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[1581]

Y. Mu, M. Shahzad and X. Zhu.
MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Multivariate Time Series Classification (MTSC) is crucial in extensive practical applications, such as environmental monitoring, medical EEG analysis, and action recognition. Real-world time series datasets typically exhibit complex dynamics. To capture this complexity, RNN-based, CNN-based, Transformer-based, and hybrid models have been proposed. Unfortunately, current deep learning-based methods often neglect the simultaneous construction of local features and global dependencies at different time scales, lacking sufficient feature extraction capabilities to achieve satisfactory classification accuracy. To address these challenges, we propose a novel Multiscale Periodic Time Series Network (MPTSNet), which integrates multiscale local patterns and global correlations to fully exploit the inherent information in time series. Recognizing the multi-periodicity and complex variable correlations in time series, we use the Fourier transform to extract primary periods, enabling us to decompose data into multiscale periodic segments. Leveraging the inherent strengths of CNN and attention mechanism, we introduce the PeriodicBlock, which adaptively captures local patterns and global dependencies while offering enhanced interpretability through attention integration across different periodic scales. The experiments on UEA benchmark datasets demonstrate that the proposed MPTSNet outperforms 21 existing advanced baselines in the MTSC tasks.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1580]

Y. Shen, Z. Zhuang, K. Yuan, M.-I. Nicolae, N. Navab, N. Padoy and M. Fritz.
Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems, assisting medical personnel with decision making and results analysis. Models for radiology report generation are able to interpret medical imagery, thus reducing the workload of radiologists. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. However, these assets are potentially vulnerable to model stealing, where attackers aim to replicate their functionality via black-box access. So far, model stealing for the medical domain has focused on classification; however, existing attacks are not effective against MLLMs. In this paper, we introduce Adversarial Domain Alignment (ADA-STEAL), the first stealing attack against medical MLLMs. ADA-STEAL relies on natural images, which are public and widely available, as opposed to their medical counterparts. We show that data augmentation with adversarial noise is sufficient to overcome the data distribution gap between natural images and the domain-specific distribution of the victim MLLM. Experiments on the IU X-RAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data.

MCML Authors

Kun Yuan

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1579]

Y. Zhang, Z. Ma, Y. Ma, Z. Han, Y. Wu and V. Tresp.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks and continuously refining this plan, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.

MCML Authors

Yao Zhang

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1578]

P. Ma, L. Rietdorf, D. Kotovenko, V. T. Hu and B. Ommer.
Does VLM Classification Benefit from LLM Description Semantics?
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. Invited talk. DOI

Abstract

Accurately describing images with text is a foundation of explainable AI. Vision-Language Models (VLMs) like CLIP have recently addressed this by aligning images and texts in a shared embedding space, expressing semantic similarities between vision and language embeddings. VLM classification can be improved with descriptions generated by Large Language Models (LLMs). However, it is difficult to determine the contribution of actual description semantics, as the performance gain may also stem from a semantic-agnostic ensembling effect, where multiple modified text prompts act as a noisy test-time augmentation for the original one. We propose an alternative evaluation scenario to decide if a performance boost of LLM-generated descriptions is caused by such a noise augmentation effect or rather by genuine description semantics. The proposed scenario avoids noisy test-time augmentation and ensures that genuine, distinctive descriptions cause the performance boost. Furthermore, we propose a training-free method for selecting discriminative descriptions that work independently of classname-ensembling effects. Our approach identifies descriptions that effectively differentiate classes within a local CLIP label neighborhood, improving classification accuracy across seven datasets. Additionally, we provide insights into the explainability of description-based image classification with VLMs.

MCML Authors

Pingchuan Ma

Computer Vision & Learning

Vincent Tao Hu

Dr.

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1577]

M. Gui, J. Schusterbauer, U. Prestel, P. Ma, D. Kotovenko, O. Grebenkova, S. A. Baumann, V. T. Hu and B. Ommer.
DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. Oral Presentation. DOI

Abstract

Current discriminative depth estimation methods often produce blurry artifacts, while generative approaches suffer from slow sampling due to curvatures in the noise-to-depth transport. Our method addresses these challenges by framing depth estimation as a direct transport between image and depth distributions. We are the first to explore flow matching in this field, and we demonstrate that its interpolation trajectories enhance both training and sampling efficiency while preserving high performance. While generative models typically require extensive training data, we mitigate this dependency by integrating external knowledge from a pre-trained image diffusion model, enabling effective transfer even across differing objectives. To further boost our model performance, we employ synthetic data and utilize image-depth pairs generated by a discriminative model on an in-the-wild image dataset. As a generative model, our model can reliably estimate depth confidence, which provides an additional advantage. Our approach achieves competitive zero-shot performance on standard benchmarks of complex natural scenes while improving sampling efficiency and only requiring minimal synthetic data for training.

MCML Authors

Johannes Schusterbauer

Computer Vision & Learning

Pingchuan Ma

Computer Vision & Learning

Olga Grebenkova

Computer Vision & Learning

Vincent Tao Hu

Dr.

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1576]

K. Geißler, T. L. Koller, A. Ambroladze, E. M. Fallenberg, M. Ingrisch and H. K. Hahn.
Breast cancer risk prediction using background parenchymal enhancement, radiomics, and symmetry features on MRI.
SPIE 2025 - SPIE Medical Imaging: Computer-Aided Diagnosis. San Diego, CA, USA, Feb 16-21, 2025. DOI

Abstract

Breast cancer is the world’s most prevalent cancer type. Risk models predicting the chance of near future cancer development can help to increase the efficiency of screening programs by targeting high risk patients specifically. In this study we develop machine learning models for predicting the 2 year risk for breast cancer and current breast cancer detection. Therefore, we leverage feature sets based on background parenchymal enhancement (BPE), radiomics and breast symmetry. We train and evaluate our models on longitudinal MRI data from a German high risk screening program using random forests and 5-fold cross validation. The models, which are developed similar to prior work for breast cancer risk prediction, have low predictive power on our dataset. The best performing model is based on BPE features and achieves an AUC of 0.57 for 2 year breast cancer risk prediction.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[1575]

T. L. Koller, K. Geißler, A. Ambroladze, E. M. Fallenberg, M. Ingrisch, H. Amer, P. Seeböck, G. Langs and H. K. Hahn.
Pitfalls with anomaly detection for breast cancer risk prediction.
SPIE 2025 - SPIE Medical Imaging: Computer-Aided Diagnosis. San Diego, CA, USA, Feb 16-21, 2025. DOI

Abstract

Breast cancer has the highest prevalence in the world, and thus, most countries have screening programs which aim to detect the cancer onset early. In these screening programs, negative studies dominate the dataset. Unsu- pervised anomaly detection promises to take advantage of the negative studies by using it to detect abnormalities as cancer or signs of cancer onset. In this study, we evaluate an anomaly detection method for cancer predic- tion (1-year ahead) on a MRI dataset of a high risk cohort with BRCA1 and BRCA2 gene mutations. As the approach fails to predict cancer risk on the dataset, we investigate the intrinsic behavior of the method. Our analysis reveals, that the reconstruction based method might only detect high intensity anomalies and that the reconstruction quality is highly correlated with noisy patterns in the image patches.

MCML Authors

Michael Ingrisch

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

[1574]

L. Burk, A. Bender and M. N. Wright.
High-Dimensional Variable Selection With Competing Events Using Cooperative Penalized Regression.
Biometrical Journal 67.1 (Feb. 2025). DOI

Abstract

Variable selection is an important step in the analysis of high-dimensional data, yet there are limited options for survival outcomes in the presence of competing risks. Commonly employed penalized Cox regression considers each event type separately through cause-specific models, neglecting possibly shared information between them. We adapt the feature-weighted elastic net (fwelnet), an elastic net generalization, to survival outcomes and competing risks. For two causes, our proposed algorithm fits two alternating cause-specific models, where each model receives the coefficient vector of the complementary model as prior information. We dub this ‘‘cooperative penalized regression’’, as it enables the modeling of competing risk data with cause-specific models while accounting for shared effects between causes. Coefficients that are shrunken toward zero in the model for the first cause will receive larger penalization weights in the model for the second cause and vice versa. Through multiple iterations, this process ensures stronger penalization of uninformative predictors in both models. We demonstrate our method’s variable selection capabilities on simulated genomics data and apply it to bladder cancer microarray data. We evaluate selection performance using the positive predictive value for the correct selection of informative features and the false positive rate for the selection of uninformative variables. The benchmark compares results with cause-specific penalized Cox regression, random survival forests, and likelihood-boosted Cox regression. Results indicate that our approach is more effective at selecting informative features and removing uninformative features. In settings without shared effects, variable selection performance is similar to cause-specific penalized Cox regression.

MCML Authors

Lukas Burk

Statistical Learning and Data Science

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[1573]

M. Wünsch, C. Sauer, M. Herrmann, L. C. Hinske and A.-L. Boulesteix.
To tweak or not to tweak. How exploiting flexibilities in gene set analysis leads to over-optimism.
Biometrical Journal 67.1 (Feb. 2025). DOI

Abstract

Gene set analysis, a popular approach for analyzing high-throughput gene expression data, aims to identify sets of genes that show enriched expression patterns between two conditions. In addition to the multitude of methods available for this task, users are typically left with many options when creating the required input and specifying the internal parameters of the chosen method. This flexibility can lead to uncertainty about the “right” choice, further reinforced by a lack of evidence-based guidance. Especially when their statistical experience is scarce, this uncertainty might entice users to produce preferable results using a ’trial-and-error’ approach. While it may seem unproblematic at first glance, this practice can be viewed as a form of ‘cherry-picking’ and cause an optimistic bias, rendering the results nonreplicable on independent data. After this problem has attracted a lot of attention in the context of classical hypothesis testing, we now aim to raise awareness of such overoptimism in the different and more complex context of gene set analyses. We mimic a hypothetical researcher who systematically selects the analysis variants yielding their preferred results, thereby considering three distinct goals they might pursue. Using a selection of popular gene set analysis methods, we tweak the results in this way for two frequently used benchmark gene expression data sets. Our study indicates that the potential for overoptimism is particularly high for a group of methods frequently used despite being commonly criticized. We conclude by providing practical recommendations to counter overoptimism in research findings in gene set analysis and beyond.

MCML Authors

Milena Wünsch

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Christina Sauer (née Nießl)

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[1572]

M. Fornasier and L. Sun.
A PDE Framework of Consensus-Based Optimization for Objectives with Multiple Global Minimizers.
Communications in Partial Differential Equations 50.4 (Feb. 2025). DOI

Abstract

Introduced in 2017, Consensus-Based Optimization (CBO) has rapidly emerged as a significant breakthrough in global optimization. This straightforward yet powerful multi-particle, zero-order optimization method draws inspiration from Simulated Annealing and Particle Swarm Optimization. Using a quantitative mean-field approximation, CBO dynamics can be described by a nonlinear Fokker-Planck equation with degenerate diffusion, which does not follow a gradient flow structure. In this paper, we demonstrate that solutions to the CBO equation remain positive and maintain full support. Building on this foundation, we establish the { unconditional} global convergence of CBO methods to global minimizers. Our results are derived through an analysis of solution regularity and the proof of existence for smooth, classical solutions to a broader class of drift-diffusion equations, despite the challenges posed by degenerate diffusion.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Lukang Sun

Applied Numerical Analysis

[1571]

D. Tschernutter and S. Feuerriegel.
Data-driven dynamic police patrolling: An efficient Monte Carlo tree search.
European Journal of Operational Research 321.1 (Feb. 2025). DOI

Abstract

Crime is responsible for major financial losses and serious harm to the well-being of individuals, and, hence, a crucial task of police operations is effective patrolling. Yet, in existing decision models aimed at police operations, microscopic routing decisions from patrolling are not considered, and, furthermore, the objective is limited to surrogate metrics (e. g., response time) instead of crime prevention. In this paper, we thus formalize the decision problem of dynamic police patrolling as a Markov decision process that models microscopic routing decisions, so that the expected number of prevented crimes are maximized. We experimentally show that standard solution approaches for our decision problem are not scalable to real-world settings. As a remedy, we present a tailored and highly efficient Monte Carlo tree search algorithm. We then demonstrate our algorithm numerically using real-world crime data from Chicago and show that the decision-making by our algorithm offers significant improvements for crime prevention over patrolling tactics from current practice. Informed by our results, we finally discuss implications for improving the patrolling tactics in police operations.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1570]

A. T. Stüber, M. M. Heimer, J. Ta, M. P. Fabritius, B. F. Hoppe, G. Sheikh, M. Brendel, L. Unterrainer, P. Jurmeister, A. Tufman, J. Ricke, C. C. Cyran and M. Ingrisch.
Replication study of PD-L1 status prediction in NSCLC using PET/CT radiomics.
European Journal of Radiology 183.111825 (Feb. 2025). DOI

Abstract

This study investigates the predictive capability of radiomics in determining programmed cell death ligand 1 (PD-L1) expression (>=1%) status in non-small cell lung cancer (NSCLC) patients using a newly collected [18F]FDG PET/CT dataset. We aimed to replicate and validate the radiomics-based machine learning (ML) model proposed by Zhao et al. [2] predicting PD-L1 status from PET/CT-imaging.
An independent cohort of 254 NSCLC patients underwent [18F]FDG PET/CT imaging, with primary tumor segmentation conducted using lung tissue window (LTW) and more conservative soft tissue window (STW) methods. Radiomics models (“Rad-score” and “complex model”) and a clinical-stage model from Zhao et al. were evaluated via 10-fold cross-validation and AUC analysis, alongside a benchmark-study comparing different ML-model pipelines. Clinicopathological data were collected from medical records.
On our data, the Rad-score model yielded mean AUCs of 0.593 (STW) and 0.573 (LTW), below Zhao et al.’s 0.761. The complex model achieved mean AUCs of 0.505 (STW) and 0.519 (LTW), lower than Zhao et al.’s 0.769. The clinical model showed a mean AUC of 0.555, below Zhao et al.’s 0.64. All models performed significantly lower than Zhao et al.’s findings. Our benchmark study on four ML pipelines revealed consistently low performance across all configurations.
Our study failed to replicate original findings, suggesting poor model performance and questioning predictive value of radiomics features in classifying PD-L1 expression from PET/CT imaging. These results highlight challenges in replicating radiomics-based ML models and stress the need for rigorous validation

MCML Authors

Theresa Stüber

Clinical Data Science in Radiology

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[1569]

Z. Sun, J. Kang, K. Qian, B. W. Schuller and B. Hu.
Creating Healthier Living Environments: The Role of Soundscapes in Promoting Mental Health and Well-Being.
IEEE Transactions on Computational Social Systems 12.1 (Feb. 2025). DOI

Abstract

With great pride and anticipation, we present the first issue of IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS (TCSS) for 2025. Reflecting on the remarkable achievements of 2024, this past year stands as a testament to academic excellence and prolific scholarly output. Over the course of the year, our journal published an impressive 642 high-quality articles, totaling approximately 5800 pages, distributed across six issues. These works collectively underscore the vibrant growth and interdisciplinary impact of computational social systems.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1568]

M. Ghahremani, B. R. Ernhofer, J. Wang, M. Makowski and C. Wachinger.
Organ-DETR: Organ Detection via Transformers.
IEEE Transactions on Medical Imaging Early Access (Feb. 2025). DOI URL

Abstract

Query-based Transformers have been yielding impressive performance in object localization and detection tasks. However, their application to organ detection in 3D medical imaging data has been relatively unexplored. This study introduces Organ-DETR, featuring two innovative modules, MultiScale Attention (MSA) and Dense Query Matching (DQM), designed to enhance the performance of Detection Transformers (DETRs) for 3D organ detection. MSA is a novel top-down representation learning approach for efficiently encoding Computed Tomography (CT) features. This architecture employs a multiscale attention mechanism, utilizing both dual self-attention and cross-scale attention mechanisms to extract intra- and inter-scale spatial interactions in the attention mechanism. Organ-DETR also introduces DQM, an approach for one-to-many matching that tackles the label assignment difficulties in organ detection. DQM increases positive queries to enhance both recall scores and training efficiency without the need for additional learnable parameters. Extensive results on five 3D CT datasets indicate that the proposed Organ-DETR outperforms comparable techniques by achieving a remarkable improvement of +10.6 mAP COCO.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1567]

D. Huang, C. Li, A. Karlas, X. Chu, K. W. S. Au, N. Navab and Z. Jiang.
VibNet: Vibration-Boosted Needle Detection in Ultrasound Images.
IEEE Transactions on Medical Imaging Early Access (Feb. 2025). DOI GitHub

Abstract

Precise percutaneous needle detection is crucial for ultrasound (US)-guided interventions. However, inherent limitations such as speckles, needle-like artifacts, and low resolution make it challenging to robustly detect needles, especially when their visibility is reduced or imperceptible. To address this challenge, we propose VibNet, a learning-based framework designed to enhance the robustness and accuracy of needle detection in US images by leveraging periodic vibration applied externally to the needle shafts. VibNet integrates neural Short-Time Fourier Transform and Hough Transform modules to achieve successive sub-goals, including motion feature extraction in the spatiotemporal space, frequency feature aggregation, and needle detection in the Hough space. Due to the periodic subtle vibration, the features are more robust in the frequency domain than in the image intensity domain, making VibNet more effective than traditional intensity-based methods. To demonstrate the effectiveness of VibNet, we conducted experiments on distinct ex vivo porcine and bovine tissue samples. The results obtained on porcine samples demonstrate that VibNet effectively detects needles even when their visibility is severely reduced, with a tip error of 1.61±1.56 mm compared to 8.15±9.98 mm for UNet and 6.63±7.58 mm for WNet, and a needle direction error of 1.64 ± 1.86° compared to 9.29 ± 15.30° for UNet and 8.54 ± 17.92° for WNet.

MCML Authors

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1566]

X.-Y. Tong, R. Dong and X. Zhu.
Global high categorical resolution land cover mapping via weak supervision.
ISPRS Journal of Photogrammetry and Remote Sensing 220 (Feb. 2025). DOI GitHub

Abstract

Land cover information is indispensable for advancing the United Nations’ sustainable development goals, and land cover mapping under a more detailed category system would significantly contribute to economic livelihood tracking and environmental degradation measurement. However, the substantial difficulty in acquiring fine-grained training data makes the implementation of this task particularly challenging. Here, we propose to combine fully labeled source domain and weakly labeled target domain for weakly supervised domain adaptation (WSDA). This is beneficial as the utilization of sparse and coarse weak labels can considerably alleviate the labor required for precise and detailed land cover annotation. Specifically, we introduce the Prototype-based pseudo-label Rectification and Expansion (PRE) approach, which leverages the prototypes (i.e., the class-wise feature centroids) as the bridge to connect sparse labels and global feature distributions. According to the feature distances to the prototypes, the confidence of pseudo-labels predicted in the unlabeled regions of the target domain is assessed. This confidence is then utilized to guide the dynamic expansion and rectification of pseudo-labels. Based on PRE, we carry out high categorical resolution land cover mapping for 10 cities in different regions around the world, severally using PlanetScope, Gaofen-1, and Sentinel-2 satellite images. In the study areas, we achieve cross-sensor, cross-category, and cross-continent WSDA, with the overall accuracy exceeding 80%. The promising results indicate that PRE is capable of reducing the dependency of land cover classification on high-quality annotations, thereby improving label efficiency. We expect our work to enable global fine-grained land cover mapping, which in turn promote Earth observation to provide more precise and thorough information for environmental monitoring.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1565]

J. Hanselle, S. Heid, J. Fürnkranz and E. Hüllermeier.
Probabilistic scoring lists for interpretable machine learning.
Machine Learning 114.55 (Feb. 2025). DOI

Abstract

A scoring system is a simple decision model that checks a set of features, adds a certain number of points to a total score for each feature that is satisfied, and finally makes a decision by comparing the total score to a threshold. Scoring systems have a long history of active use in safety-critical domains such as healthcare and justice, where they provide guidance for making objective and accurate decisions. Given their genuine interpretability, the idea of learning scoring systems from data is obviously appealing from the perspective of explainable AI. In this paper, we propose a practically motivated extension of scoring systems called probabilistic scoring lists (PSL), as well as a method for learning PSLs from data. Instead of making a deterministic decision, a PSL represents uncertainty in the form of probability distributions, or, more generally, probability intervals. Moreover, in the spirit of decision lists, a PSL evaluates features one by one and stops as soon as a decision can be made with enough confidence. To evaluate our approach, we conduct case studies in the medical domain and on standard benchmark data.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1564]

T. Willem, V. A. Shitov, M. D. Luecken, N. Kilbertus, S. Bauer, M. Piraud, A. Buyx and F. J. Theis.
Biases in machine-learning models of human single-cell data.
Nature Cell Biology (Feb. 2025). DOI

Abstract

Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases.

MCML Authors

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stefan Bauer

Prof. Dr.

Algorithmic Machine Learning & Explainable AI

[1563]

C. I. Bercea, B. Wiestler, D. Rückert and J. A. Schnabel.
Evaluating normative representation learning in generative AI for robust anomaly detection in brain imaging.
Nature Communications 16.1624 (Feb. 2025). DOI GitHub

Abstract

Normative representation learning focuses on understanding the typical anatomical distributions from large datasets of medical scans from healthy individuals. Generative Artificial Intelligence (AI) leverages this attribute to synthesize images that accurately reflect these normative patterns. This capability enables the AI allowing them to effectively detect and correct anomalies in new, unseen pathological data without the need for expert labeling. Traditional anomaly detection methods often evaluate the anomaly detection performance, overlooking the crucial role of normative learning. In our analysis, we introduce novel metrics, specifically designed to evaluate this facet in AI models. We apply these metrics across various generative AI frameworks, including advanced diffusion models, and rigorously test them against complex and diverse brain pathologies. In addition, we conduct a large multi-reader study to compare these metrics to experts’ evaluations. Our analysis demonstrates that models proficient in normative learning exhibit exceptional versatility, adeptly detecting a wide range of unseen medical conditions.

MCML Authors

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1562]

M. Drton, A. Grosdos, I. Portakal and N. Sturma.
Algebraic Sparse Factor Analysis.
SIAM Journal on Applied Algebra and Geometry 9 (Feb. 2025). DOI

Abstract

Factor analysis is a statistical technique that explains correlations among observed random variables with the help of a smaller number of unobserved factors. In traditional full factor analysis, each observed variable is influenced by every factor. However, many applications exhibit interesting sparsity patterns; that is, each observed variable only depends on a subset of the factors. In this paper, we study such sparse factor analysis models from an algebro-geometric perspective. Under mild conditions on the sparsity pattern, we examine the dimension of the set of covariance matrices that corresponds to a given model. Moreover, we study algebraic relations among the covariances in sparse two-factor models. In particular, we identify cases in which a Gröbner basis for these relations can be derived via a 2-delightful term order and join of toric ideals of graphs.

MCML Authors

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

Nils Sturma

Mathematical Statistics

[1561]

E. Ailer, C. L. Müller and N. Kilbertus.
Instrumental variable estimation for compositional treatments.
Scientific Reports 15.5158 (Feb. 2025). DOI

Abstract

Many scientific datasets are compositional in nature. Important biological examples include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. First, we crisply articulate potential pitfalls for practitioners regarding the interpretation of compositional causes from the viewpoint of interventions and warn against attributing causal meaning to common summary statistics such as diversity indices in microbiome data analysis. We then advocate for and develop multivariate methods using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account while still yielding scientifically interpretable results. In a comparative analysis on synthetic and real microbiome data we show the advantages and limitations of our proposal. We posit that our analysis provides a useful framework and guidance for valid and informative cause-effect estimation in the context of compositional data.

MCML Authors

Elisabeth Ailer

* Former Member

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Niki Kilbertus

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Ethics in Systems Design and Machine Learning

[1560]

V. Steidl, J. L. Bamber and X. Zhu.
Physics-aware machine learning for glacier ice thickness estimation: a case study for Svalbard.
The Cryosphere 19.2 (Feb. 2025). DOI

Abstract

The ice thickness of the world’s glaciers is mostly unmeasured, and physics-based models to reconstruct ice thickness cannot always deliver accurate estimates. In this study, we use deep learning paired with physical knowledge to generate ice thickness estimates for all glaciers of Spitsbergen, Barentsøya, and Edgeøya in Svalbard. We incorporate mass conservation and other physically derived conditions into a neural network to predict plausible ice thicknesses even for glaciers without any in situ ice thickness measurements. With a glacier-wise cross-validation scheme, we evaluate the performance of the physics-informed neural network. The results of these proof-of-concept experiments let us identify several challenges and opportunities that affect the model’s performance in a real-world setting.

MCML Authors

Viola Steidl

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1559]

V. Iwuajoku, K. Ekici, A. Haas, M. Z. Kazemi, A. Kasajima, C. Delbridge, A. Muckenhuber, E. Schmoeckel, F. Stögbauer, C. Bollwein, K. Schwamborn, K. Steiger, C. Mogler and P. J. Schüffler.
An equivalency and efficiency study for one year digital pathology for clinical routine diagnostics in an accredited tertiary academic center.
Virchows Archiv (Feb. 2025). DOI

Abstract

Digital pathology is revolutionizing clinical diagnostics by offering enhanced efficiency, accuracy, and accessibility of pathological examinations. This study explores the implementation and validation of digital pathology in a large tertiary academic center, focusing on its gradual integration and transition into routine clinical diagnostics. In a comprehensive validation process over a 6-month period, we compared sign-out of digital and physical glass slides of a wide range of different tissue specimens and histopathological diagnoses. Key metrics such as diagnostic concordance and user satisfaction were assessed by involving the pathologists in a validation training and study phase. We measured turnaround times before and after transitioning to digital pathology to assess the impact on overall efficiency. Our results demonstrate a 99% concordance between the analog and digital reports while at the same time reducing the time to sign out a case by almost a minute, suggesting potential long-term efficiency gains. Our digital transition positively impacted our pathology workflow: Pathologists reported increased flexibility and satisfaction due to the ease of accessing and sharing digital slides. However, challenges were identified, including technical issues related to image quality and system integration. Lessons learned from this study emphasize the importance of robust training programs, adequate IT support, and ongoing evaluation to ensure successful integration. This validation study confirms that digital pathology is a viable and beneficial tool for accurate clinical routine diagnostics in large academic centers, offering insights for other institutions considering similar endeavors.

MCML Authors

Peter Schüffler

Prof. Dr.

Computational Pathology

[1558]

E. Banzato, M. Drton, K. Saraf-Poor and H. Shi.
Existence of Direct Density Ratio Estimators.
Preprint (Feb. 2025). arXiv

Abstract

Many two-sample problems call for a comparison of two distributions from an exponential family. Density ratio estimation methods provide ways to solve such problems through direct estimation of the differences in natural parameters. The term direct indicates that one avoids estimating both marginal distributions. In this context, we consider the Kullback–Leibler Importance Estimation Procedure (KLIEP), which has been the subject of recent work on differential networks. Our main result shows that the existence of the KLIEP estimator is characterized by whether the average sufficient statistic for one sample belongs to the convex hull of the set of all sufficient statistics for data points in the second sample. For high-dimensional problems it is customary to regularize the KLIEP loss by adding the product of a tuning parameter and a norm of the vector of parameter differences. We show that the existence of the regularized KLIEP estimator requires the tuning parameter to be no less than the dual norm-based distance between the average sufficient statistic and the convex hull. The implications of these existence issues are explored in applications to differential network analysis.

MCML Authors

Mathias Drton

Prof. Dr.

Mathematical Statistics

[1557]

H. Bao and M. Schomaker.
Addressing Positivity Violations in Continuous Interventions through Data-Adaptive Strategies.
Preprint (Feb. 2025). arXiv

Abstract

Positivity violations pose a key challenge in the estimation of causal effects, particularly for continuous interventions. Current approaches for addressing this issue include the use of projection functions or modified treatment policies. While effective in many contexts, these methods can result in estimands that potentially do not align well with the original research question, thereby leading to compromises in interpretability. In this paper, we introduce a novel diagnostic tool, the non-overlap ratio, to detect positivity violations. To address these violations while maintaining interpretability, we propose a data-adaptive solution, specially a ‘most feasible’ intervention strategy. Our strategy operates on a unit-specific basis. For a given intervention of interest, we first assess whether the intervention value is feasible for each unit. For units with sufficient support, conditional on confounders, we adhere to the intervention of interest. However, for units lacking sufficient support, as identified through the assessment of the non-overlap ratio, we do not assign the actual intervention value of interest. Instead, we assign the closest feasible value within the support region. We propose an estimator using g-computation coupled with flexible conditional density estimation to estimate high- and low support regions to estimate this new estimand. Through simulations, we demonstrate that our method effectively reduces bias across various scenarios by addressing positivity violations. Moreover, when positivity violations are absent, the method successfully recovers the standard estimand. We further validate its practical utility using real-world data from the CHAPAS-3 trial, which enrolled HIV-positive children in Zambia and Uganda.

MCML Authors

Michael Schomaker

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Biostatistics

[1556]

L. Bertolazzi, P. Mondorf, B. Plank and R. Bernardi.
The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It.
Preprint (Feb. 2025). arXiv

Abstract

The ability of large language models (LLMs) to validate their output and identify potential errors is crucial for ensuring robustness and reliability. However, current research indicates that LLMs struggle with self-correction, encountering significant challenges in detecting errors. While studies have explored methods to enhance self-correction in LLMs, relatively little attention has been given to understanding the models’ internal mechanisms underlying error detection. In this paper, we present a mechanistic analysis of error detection in LLMs, focusing on simple arithmetic problems. Through circuit analysis, we identify the computational subgraphs responsible for detecting arithmetic errors across four smaller-sized LLMs. Our findings reveal that all models heavily rely on consistency heads–attention heads that assess surface-level alignment of numerical values in arithmetic solutions. Moreover, we observe that the models’ internal arithmetic computation primarily occurs in higher layers, whereas validation takes place in middle layers, before the final arithmetic results are fully encoded. This structural dissociation between arithmetic computation and validation seems to explain why current LLMs struggle to detect even simple arithmetic errors.

MCML Authors

Philipp Mondorf

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1555]

J. Bi, Y. Wang, D. Yan, X. Xiao, A. Hecker, V. Tresp and Y. Ma.
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection.
Preprint (Feb. 2025). arXiv

Abstract

Visual instruction tuning refines pre-trained Multimodal Large Language Models (MLLMs) to enhance their real-world task performance. However, the rapid expansion of visual instruction datasets introduces significant data redundancy, leading to excessive computational costs. Existing data selection methods predominantly rely on proxy models or loss-based metrics, both of which impose substantial computational overheads due to the necessity of model inference and backpropagation. To address this challenge, we propose PRISM, a novel training-free approach for efficient multimodal data selection. Unlike existing methods, PRISM eliminates the reliance on proxy models, warm-up pretraining, and gradient-based optimization. Instead, it leverages Pearson correlation analysis to quantify the intrinsic visual encoding properties of MLLMs, computing a task-specific correlation score to identify high-value instances. This not only enbles data-efficient selection,but maintains the original performance. Empirical evaluations across multiple MLLMs demonstrate that PRISM reduces the overall time required for visual instruction tuning and data selection to just 30% of conventional methods, while surpassing fully fine-tuned models across eight multimodal and three language understanding benchmarks, achieving a 101.7% relative improvement in final performance.

MCML Authors

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

[1554]

S. Dirksen, W. Li and J. Maly.
Subspace and DOA estimation under coarse quantization.
Preprint (Feb. 2025). arXiv

Abstract

We study direction-of-arrival (DOA) estimation from coarsely quantized data. We focus on a two-step approach which first estimates the signal subspace via covariance estimation and then extracts DOA angles by the ESPRIT algorithm. In particular, we analyze two stochastic quantization schemes which use dithering: a one-bit quantizer combined with rectangular dither and a multi-bit quantizer with triangular dither. For each quantizer, we derive rigorous high probability bounds for the distances between the true and estimated signal subspaces and DOA angles. Using our analysis, we identify scenarios in which subspace and DOA estimation via triangular dithering qualitatively outperforms rectangular dithering. We verify in numerical simulations that our estimates are optimal in their dependence on the smallest non-zero eigenvalue of the target matrix. The resulting subspace estimation guarantees are equally applicable in the analysis of other spectral estimation algorithms and related problems.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[1553]

B. D. Earp, S. P. Mann, M. Aboy, E. Awad, M. Betzler, M. Botes, R. Calcott, M. Caraccio, N. Chater, M. Coeckelbergh, M. Constantinescu, H. Dabbagh, K. Devlin, X. Ding, V. Dranseika, J. A. C. Everett, R. Fan, F. Feroz, K. B. Francis, C. Friedman, O. Friedrich, I. Gabriel, I. Hannikainen, J. Hellmann, A. K. Jahrome, N. S. Janardhanan, P. Jurcys, A. Kappes, M. A. Khan, G. Kraft-Todd, M. Kroner Dale, S. M. Laham, B. Lange, M. Leuenberger, J. Lewis, P. Liu, D. M. Lyreskog, M. Maas, J. McMillan, E. Mihailov, T. Minssen, J. Teperowski Monrad, K. Muyskens, S. Myers, S. Nyholm, A. M. Owen, A. Puzio, C. Register, M. G. Reinecke, A. Safron, H. Shevlin, H. Shimizu, P. V. Treit, C. Voinea, K. Yan, A. Zahiu, R. Zhang, H. Zohny, W. Sinnott-Armstrong, I. Singh, J. Savulescu and M. S. Clark.
Relational Norms for Human-AI Cooperation.
Preprint (Feb. 2025). arXiv

Abstract

How we should design and interact with social artificial intelligence depends on the socio-relational role the AI is meant to emulate or occupy. In human society, relationships such as teacher-student, parent-child, neighbors, siblings, or employer-employee are governed by specific norms that prescribe or proscribe cooperative functions including hierarchy, care, transaction, and mating. These norms shape our judgments of what is appropriate for each partner. For example, workplace norms may allow a boss to give orders to an employee, but not vice versa, reflecting hierarchical and transactional expectations. As AI agents and chatbots powered by large language models are increasingly designed to serve roles analogous to human positions - such as assistant, mental health provider, tutor, or romantic partner - it is imperative to examine whether and how human relational norms should extend to human-AI interactions. Our analysis explores how differences between AI systems and humans, such as the absence of conscious experience and immunity to fatigue, may affect an AI’s capacity to fulfill relationship-specific functions and adhere to corresponding norms. This analysis, which is a collaborative effort by philosophers, psychologists, relationship scientists, ethicists, legal experts, and AI researchers, carries important implications for AI systems design, user behavior, and regulation. While we accept that AI systems can offer significant benefits such as increased availability and consistency in certain socio-relational roles, they also risk fostering unhealthy dependencies or unrealistic expectations that could spill over into human-human relationships. We propose that understanding and thoughtfully shaping (or implementing) suitable human-AI relational norms will be crucial for ensuring that human-AI interactions are ethical, trustworthy, and favorable to human well-being.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[1552]

M. Fornasier and L. Sun.
Regularity and positivity of solutions of the Consensus-Based Optimization equation: unconditional global convergence.
Preprint (Feb. 2025). arXiv

Abstract

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Lukang Sun

Applied Numerical Analysis

[1551]

F. P. D. Frederico P. Delgado, F. Simões, L. Kronik, W. Kaiser and D. A. Egger.
Machine-Learning Force Fields Reveal Shallow Electronic States on Dynamic Halide Perovskite Surfaces.
Preprint (Feb. 2025). arXiv

Abstract

The spectacular performance of halide perovskites in optoelectronic devices is rooted in their favorable tolerance to structural defects. Previous studies showed that defects in these materials generate shallow electronic states that do not degrade device performance. However, how these shallow states persist amid the pronounced thermally-stimulated atomic dynamics on halide perovskite surfaces remains unknown. This work reveals that electronic states at surfaces of the prototypical CsPbBr3 variant are energetically distributed at room temperature, akin to well-passivated inorganic semiconductors, even when covalent bonds remain cleaved and undercoordinated. Specifically, a striking tendency for shallow surface states is found with approximately 70% of surface-state energies appearing within 0.2 eV or ≈8kBT from the valence-band edge. Furthermore, we show that even when surface states appear deeper in the gap, they are not energetically isolated and are less likely to act as traps. We achieve this result by accelerating first-principles calculations via machine-learning techniques and show that the unique atomic dynamics in these materials render the formation of deep electronic states at their surfaces unlikely. These findings reveal the microscopic mechanism behind the low density of deep defect states at dynamic halide perovskite surfaces, which is key to their exceptional performance in devices.

MCML Authors

David Egger

Prof. Dr.

Theory of Functional Energy Materials

[1550]

T. Fröch, O. Wysocki, Y. Xia, J. Xie, B. Schwab, D. Cremers and T. H. Kolbe.
FacaDiffy: Inpainting Unseen Facade Parts Using Diffusion Models.
Preprint (Feb. 2025). arXiv GitHub

Abstract

High-detail semantic 3D building models are frequently utilized in robotics, geoinformatics, and computer vision. One key aspect of creating such models is employing 2D conflict maps that detect openings’ locations in building facades. Yet, in reality, these maps are often incomplete due to obstacles encountered during laser scanning. To address this challenge, we introduce FacaDiffy, a novel method for inpainting unseen facade parts by completing conflict maps with a personalized Stable Diffusion model. Specifically, we first propose a deterministic ray analysis approach to derive 2D conflict maps from existing 3D building models and corresponding laser scanning point clouds. Furthermore, we facilitate the inpainting of unseen facade objects into these 2D conflict maps by leveraging the potential of personalizing a Stable Diffusion model. To complement the scarcity of real-world training data, we also develop a scalable pipeline to produce synthetic conflict maps using random city model generators and annotated facade images. Extensive experiments demonstrate that FacaDiffy achieves state-of-the-art performance in conflict map completion compared to various inpainting baselines and increases the detection rate by 22% when applying the completed conflict maps for high-definition 3D semantic building reconstruction.

MCML Authors

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1549]

M. Fuest, V. T. Hu and B. Ommer.
MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation.
Preprint (Feb. 2025). arXiv

Abstract

Generating long, high-quality videos remains a challenge due to the complex interplay of spatial and temporal dynamics and hardware limitations. In this work, we introduce textbf{MaskFlow}, a unified video generation framework that combines discrete representations with flow-matching to enable efficient generation of high-quality long videos. By leveraging a frame-level masking strategy during training, MaskFlow conditions on previously generated unmasked frames to generate videos with lengths ten times beyond that of the training sequences. MaskFlow does so very efficiently by enabling the use of fast Masked Generative Model (MGM)-style sampling and can be deployed in both fully autoregressive as well as full-sequence generation modes. We validate the quality of our method on the FaceForensics (FFS) and Deepmind Lab (DMLab) datasets and report Fréchet Video Distance (FVD) competitive with state-of-the-art approaches. We also provide a detailed analysis on the sampling efficiency of our method and demonstrate that MaskFlow can be applied to both timestep-dependent and timestep-independent models in a training-free manner.

MCML Authors

Vincent Tao Hu

Dr.

Computer Vision & Learning

Björn Ommer

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Computer Vision & Learning

[1548]

H. Funk, R. Ludwig, H. Küchenhoff and T. Nagler.
Modelling Climate Variables at High Temporal Resolution.
Preprint (Feb. 2025). DOI

Abstract

Large ensembles of climate models are indispensable for analyzing natural climate variability and estimating the occurrence of rare extreme events. Many hydrometeorological applications—such as compound event analysis, return period estimation, weather forecasting, downscaling, and bias correction—rely on an accurate representation of the multivariate distribution of climate variables. However, at high temporal resolutions, variables like precipitation often exhibit significant zero-inflation and heavy-tailed distributions. This inflation propagates through the entire multivariate dependence structure, complicating the relationships between zero-inflated and non-inflated variables. Inadequate modeling and correction of these dependencies can substantially degrade the reliability of hydrometeorological methodologes.
In an earlier work, we developed a novel multivariate density decomposition for zero inflated variables based on vine copulas. This method has been integrated into multivariate Vine Copula Bias Correction for partially zero-inflated margins (VBC), with potential applications in other fields facing high-resolution climate data challenges. We resume the idea behind VBC and illustrate it’s advantages to other bias correction methods. This highlights the interpretability and the advantages of control and assessment of the results generated by VBC.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Thomas Nagler

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Statistics & Data Science

[1547]

L. He, E. Nie, S. S. Dindar, A. Firoozi, A. Florea, V. Nguyen, C. Puffay, R. Shimizu, H. Ye, J. Brennan, H. Schmid, H. Schütze and N. Mesgarani.
XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs.
Preprint (Feb. 2025). arXiv

Abstract

We introduce XCOMPS in this work, a multilingual conceptual minimal pair dataset covering 17 languages. Using this dataset, we evaluate LLMs’ multilingual conceptual understanding through metalinguistic prompting, direct probability measurement, and neurolinguistic probing. By comparing base, instruction-tuned, and knowledge-distilled models, we find that: 1) LLMs exhibit weaker conceptual understanding for low-resource languages, and accuracy varies across languages despite being tested on the same concept sets. 2) LLMs excel at distinguishing concept-property pairs that are visibly different but exhibit a marked performance drop when negative pairs share subtle semantic similarities. 3) Instruction tuning improves performance in concept understanding but does not enhance internal competence; knowledge distillation can enhance internal competence in conceptual understanding for low-resource languages with limited gains in explicit task performance. 4) More morphologically complex languages yield lower concept understanding scores and require deeper layers for conceptual reasoning.

MCML Authors

Ercong Nie

Computational Linguistics

Haotian Ye

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computational Linguistics

[1546]

K. Heß, D. Frauen, V. Melnychuk and S. Feuerriegel.
Efficient and Sharp Off-Policy Learning under Unobserved Confounding.
Preprint (Feb. 2025). arXiv

Abstract

We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assumes unconfoundedness, meaning that no unobserved factors influence both treatment assignment and outcomes. However, this assumption is often violated, because of which standard policy learning produces biased estimates and thus leads to policies that can be harmful. To address this limitation, we employ causal sensitivity analysis and derive a statistically efficient estimator for a sharp bound on the value function under unobserved confounding. Our estimator has three advantages: (1) Unlike existing works, our estimator avoids unstable minimax optimization based on inverse propensity weighted outcomes. (2) Our estimator is statistically efficient. (3) We prove that our estimator leads to the optimal confounding-robust policy. Finally, we extend our theory to the related task of policy improvement under unobserved confounding, i.e., when a baseline policy such as the standard of care is available. We show in experiments with synthetic and real-world data that our method outperforms simple plug-in approaches and existing baselines. Our method is highly relevant for decision-making where unobserved confounding can be problematic, such as in healthcare and public policy.

MCML Authors

Konstantin Heß

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1545]

M. Jürgens, T. Mortier, E. Hüllermeier, V. Bengs and W. Waegeman.
A calibration test for evaluating set-based epistemic uncertainty representations.
Preprint (Feb. 2025). arXiv

Abstract

The accurate representation of epistemic uncertainty is a challenging yet essential task in machine learning. A widely used representation corresponds to convex sets of probabilistic predictors, also known as credal sets. One popular way of constructing these credal sets is via ensembling or specialized supervised learning methods, where the epistemic uncertainty can be quantified through measures such as the set size or the disagreement among members. In principle, these sets should contain the true data-generating distribution. As a necessary condition for this validity, we adopt the strongest notion of calibration as a proxy. Concretely, we propose a novel statistical test to determine whether there is a convex combination of the set’s predictions that is calibrated in distribution. In contrast to previous methods, our framework allows the convex combination to be instance dependent, recognizing that different ensemble members may be better calibrated in different regions of the input space. Moreover, we learn this combination via proper scoring rules, which inherently optimize for calibration. Building on differentiable, kernel-based estimators of calibration errors, we introduce a nonparametric testing procedure and demonstrate the benefits of capturing instance-level variability on of synthetic and real-world experiments.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

* Former Member

[1544]

H. Laus, S. Parkinson, V. Charisopoulos, F. Krahmer and R. Willett.
Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay.
Preprint (Feb. 2025). arXiv

Abstract

Machine learning methods are commonly used to solve inverse problems, wherein an unknown signal must be estimated from few measurements generated via a known acquisition procedure. In particular, neural networks perform well empirically but have limited theoretical guarantees. In this work, we study an underdetermined linear inverse problem that admits several possible solution mappings. A standard remedy (e.g., in compressed sensing) establishing uniqueness of the solution mapping is to assume knowledge of latent low-dimensional structure in the source signal. We ask the following question: do deep neural networks adapt to this low-dimensional structure when trained by gradient descent with weight decay regularization? We prove that mildly overparameterized deep linear networks trained in this manner converge to an approximate solution that accurately solves the inverse problem while implicitly encoding latent subspace structure. To our knowledge, this is the first result to rigorously show that deep linear networks trained with weight decay automatically adapt to latent subspace structure in the data under practical stepsize and weight initialization schemes. Our work highlights that regularization and overparameterization improve generalization, while overparameterization also accelerates convergence during training.

MCML Authors

Hannah Laus

Optimization & Data Analysis

Felix Krahmer

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Optimization & Data Analysis

[1543]

Y. Liu, R. Chen, L. Hirlimann, A. D. Hakimi, M. Wang, A. H. Kargaran, S. Rothe, F. Yvon and H. Schütze.
On Relation-Specific Neurons in Large Language Models.
Preprint (Feb. 2025). arXiv GitHub

Abstract

In large language models (LLMs), certain neurons can store distinct pieces of knowledge learned during pretraining. While knowledge typically appears as a combination of relations and entities, it remains unclear whether some neurons focus on a relation itself – independent of any entity. We hypothesize such neurons detect a relation in the input text and guide generation involving such a relation. To investigate this, we study the Llama-2 family on a chosen set of relations with a statistics-based method. Our experiments demonstrate the existence of relation-specific neurons. We measure the effect of selectively deactivating candidate neurons specific to relation r on the LLM’s ability to handle (1) facts whose relation is r and (2) facts whose relation is a different relation r′≠r. With respect to their capacity for encoding relation information, we give evidence for the following three properties of relation-specific neurons. (i) Neuron cumulativity. The neurons for r present a cumulative effect so that deactivating a larger portion of them results in the degradation of more facts in r. (ii) Neuron versatility. Neurons can be shared across multiple closely related as well as less related relations. Some relation neurons transfer across languages. (iii) Neuron interference. Deactivating neurons specific to one relation can improve LLM generation performance for facts of other relations.

MCML Authors

Yihong Liu

Computational Linguistics

Lea Hirlimann

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Ahmad Dawar Hakimi

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Mingyang Wang

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Amir Hossein Kargaran

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computational Linguistics

[1542]

Y. Ma, D. Frauen, J. Schweisthal and S. Feuerriegel.
LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding.
Preprint (Feb. 2025). arXiv

Abstract

Estimating treatment effects is crucial for personalized decision-making in medicine, but this task faces unique challenges in clinical practice. At training time, models for estimating treatment effects are typically trained on well-structured medical datasets that contain detailed patient information. However, at inference time, predictions are often made using textual descriptions (e.g., descriptions with self-reported symptoms), which are incomplete representations of the original patient information. In this work, we make three contributions. (1) We show that the discrepancy between the data available during training time and inference time can lead to biased estimates of treatment effects. We formalize this issue as an inference time text confounding problem, where confounders are fully observed during training time but only partially available through text at inference time. (2) To address this problem, we propose a novel framework for estimating treatment effects that explicitly accounts for inference time text confounding. Our framework leverages large language models together with a custom doubly robust learner to mitigate biases caused by the inference time text confounding. (3) Through a series of experiments, we demonstrate the effectiveness of our framework in real-world applications.

MCML Authors

Yuchen Ma

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[1541]

V. Melnychuk, D. Frauen, J. Schweisthal and S. Feuerriegel.
Orthogonal Representation Learning for Estimating Causal Quantities.
Preprint (Feb. 2025). arXiv

Abstract

Representation learning is widely used for estimating causal quantities (e.g., the conditional average treatment effect) from observational data. While existing representation learning methods have the benefit of allowing for end-to-end learning, they do not have favorable theoretical properties of Neyman-orthogonal learners, such as double robustness and quasi-oracle efficiency. Also, such representation learning methods often employ additional constraints, like balancing, which may even lead to inconsistent estimation. In this paper, we propose a novel class of Neyman-orthogonal learners for causal quantities defined at the representation level, which we call OR-learners. Our OR-learners have several practical advantages: they allow for consistent estimation of causal quantities based on any learned representation, while offering favorable theoretical properties including double robustness and quasi-oracle efficiency. In multiple experiments, we show that, under certain regularity conditions, our OR-learners improve existing representation learning methods and achieve state-of-the-art performance. To the best of our knowledge, our OR-learners are the first work to offer a unified framework of representation learning methods and Neyman-orthogonal learners for causal quantities estimation.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1540]

K. Padh, Z. Li, C. Casolo and N. Kilbertus.
Your Assumed DAG is Wrong and Here's How To Deal With It.
Preprint (Feb. 2025). arXiv

Abstract

Assuming a directed acyclic graph (DAG) that represents prior knowledge of causal relationships between variables is a common starting point for cause-effect estimation. Existing literature typically invokes hypothetical domain expert knowledge or causal discovery algorithms to justify this assumption. In practice, neither may propose a single DAG with high confidence. Domain experts are hesitant to rule out dependencies with certainty or have ongoing disputes about relationships; causal discovery often relies on untestable assumptions itself or only provides an equivalence class of DAGs and is commonly sensitive to hyperparameter and threshold choices. We propose an efficient, gradient-based optimization method that provides bounds for causal queries over a collection of causal graphs – compatible with imperfect prior knowledge – that may still be too large for exhaustive enumeration. Our bounds achieve good coverage and sharpness for causal queries such as average treatment effects in linear and non-linear synthetic settings as well as on real-world data. Our approach aims at providing an easy-to-use and widely applicable rebuttal to the valid critique of `What if your assumed DAG is wrong?'.

MCML Authors

Kirtan Padh

Ethics in Systems Design and Machine Learning

Zhufeng Li

Ethics in Systems Design and Machine Learning

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[1539]

G. D. Pelegrina, P. Kolpaczki and E. Hüllermeier.
Shapley Value Approximation Based on k-Additive Games.
Preprint (Feb. 2025). arXiv

Abstract

The Shapley value is the prevalent solution for fair division problems in which a payout is to be divided among multiple agents. By adopting a game-theoretic view, the idea of fair division and the Shapley value can also be used in machine learning to quantify the individual contribution of features or data points to the performance of a predictive model. Despite its popularity and axiomatic justification, the Shapley value suffers from a computational complexity that scales exponentially with the number of entities involved, and hence requires approximation methods for its reliable estimation. We propose SVAkADD, a novel approximation method that fits a k-additive surrogate game. By taking advantage of k-additivity, we are able to elicit the exact Shapley values of the surrogate game and then use these values as estimates for the original fair division problem. The efficacy of our method is evaluated empirically and compared to competing methods.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence and Machine Learning

[1538]

Z. Peng, X. Yin, R. Qian, P. Lin, Y. Liu, C. Ying and Y. Luo.
SolEval: Benchmarking Large Language Models for Repository-level Solidity Code Generation.
Preprint (Feb. 2025). arXiv GitHub

Abstract

Large language models (LLMs) have transformed code generation. However, most existing approaches focus on mainstream languages such as Python and Java, neglecting the Solidity language, the predominant programming language for Ethereum smart contracts. Due to the lack of adequate benchmarks for Solidity, LLMs’ ability to generate secure, cost-effective smart contracts remains unexplored. To fill this gap, we construct SolEval, the first repository-level benchmark designed for Solidity smart contract generation, to evaluate the performance of LLMs on Solidity. SolEval consists of 1,125 samples from 9 different repositories, covering 6 popular domains, providing LLMs with a comprehensive evaluation benchmark. Unlike the existing Solidity benchmark, SolEval not only includes complex function calls but also reflects the real-world complexity of the Ethereum ecosystem by incorporating gas fee and vulnerability rate. We evaluate 10 LLMs on SolEval, and our results show that the best-performing LLM achieves only 26.29% Pass@10, highlighting substantial room for improvement in Solidity code generation by LLMs.

MCML Authors

Peiqin Lin

Computational Linguistics

Yongkang Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

[1537]

D. Racek, Q. Zhang, P. Thurner, X. Zhu and G. Kauermann.
Unsupervised Detection of Building Destruction during War from Publicly Available Radar Satellite Imagery.
Preprint (Feb. 2025). DOI

Abstract

The timely automated detection of building destruction in conflict zones is crucial for human rights monitoring, humanitarian response, and academic research. However, current approaches rely on expensive proprietary satellite imagery, limiting their scalability and accessibility. This study addresses these challenges by introducing an automated and unsupervised method that uses freely available Sentinel-1 synthetic aperture radar (SAR) imagery from the European Space Agency (ESA). By statistically assessing interferometric coherence changes over time, our approach enables the timely detection of building destruction at scale without requiring labeled training data, which are often not available in conflict-affected regions. We validate our method across three case studies, Beirut, Mariupol, and Gaza, demonstrating its ability to capture diverse patterns of destruction and their spatio-temporal dynamics, despite the moderate resolution of Sentinel-1 imagery. Our approach offers a scalable, global, and cost-effective solution for detecting building destruction in conflict zones.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1536]

K. Reichard, G. Rizzoli, S. Gasperini, L. Hoyer, P. Zanuttigh, N. Navab and F. Tombari.
From Open-Vocabulary to Vocabulary-Free Semantic Segmentation.
Preprint (Feb. 2025). arXiv

Abstract

Open-vocabulary semantic segmentation enables models to identify novel object categories beyond their training data. While this flexibility represents a significant advancement, current approaches still rely on manually specified class names as input, creating an inherent bottleneck in real-world applications. This work proposes a Vocabulary-Free Semantic Segmentation pipeline, eliminating the need for predefined class vocabularies. Specifically, we address the chicken-and-egg problem where users need knowledge of all potential objects within a scene to identify them, yet the purpose of segmentation is often to discover these objects. The proposed approach leverages Vision-Language Models to automatically recognize objects and generate appropriate class names, aiming to solve the challenge of class specification and naming quality. Through extensive experiments on several public datasets, we highlight the crucial role of the text encoder in model performance, particularly when the image text classes are paired with generated descriptions. Despite the challenges introduced by the sensitivity of the segmentation text encoder to false negatives within the class tagging process, which adds complexity to the task, we demonstrate that our fully automated pipeline significantly enhances vocabulary-free segmentation accuracy across diverse real-world scenarios.

MCML Authors

Stefano Gasperini

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[1535]

J. Rodemann, E. Garces Arias, C. Luther, C. Jansen and T. Augustin.
A Statistical Case Against Empirical Human-AI Alignment.
Preprint (Feb. 2025). arXiv

Abstract

Empirical human-AI alignment aims to make AI systems act in line with observed human behavior. While noble in its goals, we argue that empirical alignment can inadvertently introduce statistical biases that warrant caution. This position paper thus advocates against naive empirical alignment, offering prescriptive alignment and a posteriori empirical alignment as alternatives. We substantiate our principled argument by tangible examples like human-centric decoding of language models.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

[1534]

Y. Shen, W. Lai, S. Wang, X. Zhang, K. Luo, A. Fraser and M. Sun.
DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection.
Preprint (Feb. 2025). arXiv

Abstract

The rapid development of multilingual large language models (LLMs) highlights the need for high-quality, diverse, and clean multilingual datasets. In this paper, we introduce DCAD-2000 (Data Cleaning as Anomaly Detection), a large-scale multilingual corpus built using newly extracted Common Crawl data and existing multilingual datasets. DCAD-2000 includes over 2,282 languages, 46.72TB of data, and 8.63 billion documents, spanning 155 high- and medium-resource languages and 159 writing scripts. To overcome the limitations of current data cleaning methods, which rely on manual heuristic thresholds, we propose reframing data cleaning as an anomaly detection task. This dynamic filtering approach significantly enhances data quality by identifying and removing noisy or anomalous content. We evaluate the quality of DCAD-2000 on the FineTask benchmark, demonstrating substantial improvements in multilingual dataset quality and task performance.

MCML Authors

Wen Lai

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Data Analytics & Statistics

[1533]

N. Sturma, M. Kranzlmueller, I. Portakal and M. Drton.
Matching Criterion for Identifiability in Sparse Factor Analysis.
Preprint (Feb. 2025). arXiv

Abstract

Factor analysis models explain dependence among observed variables by a smaller number of unobserved factors. A main challenge in confirmatory factor analysis is determining whether the factor loading matrix is identifiable from the observed covariance matrix. The factor loading matrix captures the linear effects of the factors and, if unrestricted, can only be identified up to an orthogonal transformation of the factors. However, in many applications the factor loadings exhibit an interesting sparsity pattern that may lead to identifiability up to column signs. We study this phenomenon by connecting sparse factor models to bipartite graphs and providing sufficient graphical conditions for identifiability of the factor loading matrix up to column signs. In contrast to previous work, our main contribution, the matching criterion, exploits sparsity by operating locally on the graph structure, thereby improving existing conditions. Our criterion is efficiently decidable in time that is polynomial in the size of the graph, when restricting the search steps to sets of bounded size.

MCML Authors

Nils Sturma

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[1532]

M. Surner, A. Khelil and L. Bothmann.
Invariance Pair-Guided Learning: Enhancing Robustness in Neural Networks.
Preprint (Feb. 2025). arXiv

Abstract

Out-of-distribution generalization of machine learning models remains challenging since the models are inherently bound to the training data distribution. This especially manifests, when the learned models rely on spurious correlations. Most of the existing approaches apply data manipulation, representation learning, or learning strategies to achieve generalizable models. Unfortunately, these approaches usually require multiple training domains, group labels, specialized augmentation, or pre-processing to reach generalizable models. We propose a novel approach that addresses these limitations by providing a technique to guide the neural network through the training phase. We first establish input pairs, representing the spurious attribute and describing the invariance, a characteristic that should not affect the outcome of the model. Based on these pairs, we form a corrective gradient complementing the traditional gradient descent approach. We further make this correction mechanism adaptive based on a predefined invariance condition. Experiments on ColoredMNIST, Waterbird-100, and CelebA datasets demonstrate the effectiveness of our approach and the robustness to group shifts.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[1531]

Ö. Turgut, F. S. Bott, M. Ploner and D. Rückert.
Are foundation models useful feature extractors for electroencephalography analysis?
Preprint (Feb. 2025). arXiv

Abstract

The success of foundation models in natural language processing and computer vision has motivated similar approaches for general time series analysis. While these models are effective for a variety of tasks, their applicability in medical domains with limited data remains largely unexplored. To address this, we investigate the effectiveness of foundation models in medical time series analysis involving electroencephalography (EEG). Through extensive experiments on tasks such as age prediction, seizure detection, and the classification of clinically relevant EEG events, we compare their diagnostic accuracy with that of specialised EEG models. Our analysis shows that foundation models extract meaningful EEG features, outperform specialised models even without domain adaptation, and localise task-specific biomarkers. Moreover, we demonstrate that diagnostic accuracy is substantially influenced by architectural choices such as context length. Overall, our study reveals that foundation models with general time series understanding eliminate the dependency on large domain-specific datasets, making them valuable tools for clinical practice.

MCML Authors

Daniel Rückert

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence in Healthcare and Medicine

[1530]

M. Wang, A. Stoll, L. Lange, H. Adel, H. Schütze and J. Strötgen.
Bring Your Own Knowledge: A Survey of Methods for LLM Knowledge Expansion.
Preprint (Feb. 2025). arXiv

Abstract

Adapting large language models (LLMs) to new and diverse knowledge is essential for their lasting effectiveness in real-world applications. This survey provides an overview of state-of-the-art methods for expanding the knowledge of LLMs, focusing on integrating various knowledge types, including factual information, domain expertise, language proficiency, and user preferences. We explore techniques, such as continual learning, model editing, and retrieval-based explicit adaptation, while discussing challenges like knowledge consistency and scalability. Designed as a guide for researchers and practitioners, this survey sheds light on opportunities for advancing LLMs as adaptable and robust knowledge systems.

MCML Authors

Mingyang Wang

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Computational Linguistics

[1529]

C. Wu, B. Ma, N. Deng, Y. He and Y. Xue.
Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis.
Preprint (Feb. 2025). arXiv

Abstract

Aspect-based sentiment analysis (ABSA) is a sequence labeling task that has garnered growing research interest in multilingual contexts. However, recent studies lack more robust feature alignment and finer aspect-level alignment. In this paper, we propose a novel framework, Multi-Scale and Multi-Objective optimization (MSMO) for cross-lingual ABSA. During multi-scale alignment, we achieve cross-lingual sentence-level and aspect-level alignment, aligning features of aspect terms in different contextual environments. Specifically, we introduce code-switched bilingual sentences into the language discriminator and consistency training modules to enhance the model’s robustness. During multi-objective optimization, we design two optimization objectives: supervised training and consistency training, aiming to enhance cross-lingual semantic alignment. To further improve model performance, we incorporate distilled knowledge of the target language into the model. Results show that MSMO significantly enhances cross-lingual ABSA by achieving state-of-the-art performance across multiple languages and models.

MCML Authors

Bolei Ma

Social Data Science and AI

[1528]

C. Wu, B. Ma, Y. Liu, Z. Zhang, N. Deng, Y. Li, B. Chen, Y. Zhang, Y. Xue and B. Plank.
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis.
Preprint (Feb. 2025). arXiv

Abstract

Aspect-based sentiment analysis (ABSA) is a crucial task in information extraction and sentiment analysis, aiming to identify aspects with associated sentiment elements in text. However, existing ABSA datasets are predominantly English-centric, limiting the scope for multilingual evaluation and research. To bridge this gap, we present M-ABSA, a comprehensive dataset spanning 7 domains and 21 languages, making it the most extensive multilingual parallel dataset for ABSA to date. Our primary focus is on triplet extraction, which involves identifying aspect terms, aspect categories, and sentiment polarities. The dataset is constructed through an automatic translation process with human review to ensure quality. We perform extensive experiments using various baselines to assess performance and compatibility on M-ABSA. Our empirical findings highlight that the dataset enables diverse evaluation tasks, such as multilingual and multi-domain transfer learning, and large language model evaluation, underscoring its inclusivity and its potential to drive advancements in multilingual ABSA research.

MCML Authors

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Yihong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1527]

S. Wu, S. Alaniz, E. Schulz and Z. Akata.
Discovering Chunks in Neural Embeddings for Interpretability.
Preprint (Feb. 2025). arXiv

Abstract

Understanding neural networks is challenging due to their high-dimensional, interacting components. Inspired by human cognition, which processes complex sensory data by chunking it into recurring entities, we propose leveraging this principle to interpret artificial neural population activities. Biological and artificial intelligence share the challenge of learning from structured, naturalistic data, and we hypothesize that the cognitive mechanism of chunking can provide insights into artificial systems. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities, observing that their hidden states reflect these patterns, which can be extracted as a dictionary of chunks that influence network responses. Extending this to large language models (LLMs) like LLaMA, we identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts. By exploring methods to extract dictionaries of identifiable chunks across neural embeddings of varying complexity, our findings introduce a new framework for interpreting neural networks, framing their population activity as structured reflections of the data they process.

MCML Authors

Stephan Alaniz

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1526]

S. Xu, T. Y. S. S. Santosh, Y. Elazar, Q. Vogel, B. Plank and M. Grabmair.
Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases.
Preprint (Feb. 2025). arXiv

Abstract

The increased adoption of Large Language Models (LLMs) and their potential to shape public opinion have sparked interest in assessing these models’ political leanings. Building on previous research that compared LLMs and human opinions and observed political bias in system responses, we take a step further to investigate the underlying causes of such biases by empirically examining how the values and biases embedded in training corpora shape model outputs. Specifically, we propose a method to quantitatively evaluate political leanings embedded in the large pretraining corpora. Subsequently we investigate to whom are the LLMs’ political leanings more aligned with, their pretrainig corpora or the surveyed human opinions. As a case study, we focus on probing the political leanings of LLMs in 32 U.S. Supreme Court cases, addressing contentious topics such as abortion and voting rights. Our findings reveal that LLMs strongly reflect the political leanings in their training data, and no strong correlation is observed with their alignment to human opinions as expressed in surveys. These results underscore the importance of responsible curation of training data and the need for robust evaluation metrics to ensure LLMs’ alignment with human-centered values.

MCML Authors

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1525]

X. Xue and X. Zhu.
Regression in EO: Are VLMs Up to the Challenge?
Preprint (Feb. 2025). arXiv

Abstract

Earth Observation (EO) data encompass a vast range of remotely sensed information, featuring multi-sensor and multi-temporal, playing an indispensable role in understanding our planet’s dynamics. Recently, Vision Language Models (VLMs) have achieved remarkable success in perception and reasoning tasks, bringing new insights and opportunities to the EO field. However, the potential for EO applications, especially for scientific regression related applications remains largely unexplored. This paper bridges that gap by systematically examining the challenges and opportunities of adapting VLMs for EO regression tasks. The discussion first contrasts the distinctive properties of EO data with conventional computer vision datasets, then identifies four core obstacles in applying VLMs to EO regression: 1) the absence of dedicated benchmarks, 2) the discrete-versus-continuous representation mismatch, 3) cumulative error accumulation, and 4) the suboptimal nature of text-centric training objectives for numerical tasks. Next, a series of methodological insights and potential subtle pitfalls are explored. Lastly, we offer some promising future directions for designing robust, domain-aware solutions. Our findings highlight the promise of VLMs for scientific regression in EO, setting the stage for more precise and interpretable modeling of critical environmental processes.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Data Science in Earth Observation

[1524]

J. Yu, Y. Zhang, B. Wang, P. Lin, Y. Liu and S. Feng.
SSMLoRA: Enhancing Low-Rank Adaptation with State Space Model.
Preprint (Feb. 2025). arXiv GitHub

Abstract

Fine-tuning is a key approach for adapting language models to specific downstream tasks, but updating all model parameters becomes impractical as model sizes increase. Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), address this challenge by introducing additional adaptation parameters into pre-trained weight matrices. However, LoRA’s performance varies across different insertion points within the model, highlighting potential parameter inefficiency due to unnecessary insertions. To this end, we propose SSMLoRA (State Space Model Low-Rank Adaptation), an extension of LoRA that incorporates a State Space Model (SSM) to interconnect low-rank matrices. SSMLoRA ensures that performance is maintained even with sparser insertions. SSMLoRA allows the model to not only map inputs to a low-rank space for better feature extraction but also leverage the computations from the previous low-rank space. Our method achieves comparable performance to LoRA on the General Language Understanding Evaluation (GLUE) benchmark while using only half the parameters. Additionally, due to its structure, SSMLoRA shows promise in handling tasks with longer input sequences.

MCML Authors

Peiqin Lin

Computational Linguistics

Yongkang Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

[1523]

A. Zavras, D. Michail, X. Zhu, B. Demir and I. Papoutsis.
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis.
Preprint (Feb. 2025). arXiv

Abstract

The continuous operation of Earth-orbiting satellites generates vast and ever-growing archives of Remote Sensing (RS) images. Natural language presents an intuitive interface for accessing, querying, and interpreting the data from such archives. However, existing Vision-Language Models (VLMs) are predominantly trained on web-scraped, noisy image-text data, exhibiting limited exposure to the specialized domain of RS. This deficiency results in poor performance on RS-specific tasks, as commonly used datasets often lack detailed, scientifically accurate textual descriptions and instead emphasize solely on attributes like date and location. To bridge this critical gap, we introduce GAIA, a novel dataset designed for multi-scale, multi-sensor, and multi-modal RS image analysis. GAIA comprises of 205,150 meticulously curated RS image-text pairs, representing a diverse range of RS modalities associated to different spatial resolutions. Unlike existing vision-language datasets in RS, GAIA specifically focuses on capturing a diverse range of RS applications, providing unique information about environmental changes, natural disasters, and various other dynamic phenomena. The dataset provides a spatially and temporally balanced distribution, spanning across the globe, covering the last 25 years with a balanced temporal distribution of observations. GAIA’s construction involved a two-stage process: (1) targeted web-scraping of images and accompanying text from reputable RS-related sources, and (2) generation of five high-quality, scientifically grounded synthetic captions for each image using carefully crafted prompts that leverage the advanced vision-language capabilities of GPT-4o. Our extensive experiments, including fine-tuning of CLIP and BLIP2 models, demonstrate that GAIA significantly improves performance on RS image classification, cross-modal retrieval and image captioning tasks.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1522]

G. Zhang, M. Ding, T. Liu, Y. Zhang and V. Tresp.
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs.
Preprint (Feb. 2025). arXiv

Abstract

Multimodal large language models (MLLMs) have demonstrated strong performance in understanding videos holistically, yet their ability to process streaming videos-videos are treated as a sequence of visual events-remains underexplored. Intuitively, leveraging past events as memory can enrich contextual and temporal understanding of the current event. In this paper, we show that leveraging memories as contexts helps MLLMs better understand video events. However, because such memories rely on predictions of preceding events, they may contain misinformation, leading to confabulation and degraded performance. To address this, we propose a confabulation-aware memory modification method that mitigates confabulated memory for memory-enhanced event understanding.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining

Tong Liu

Database Systems and Data Mining

Yao Zhang

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1521]

M. Fornasier, J. Klemenc and A. Scagliotti.
Trade-off Invariance Principle for minimizers of regularized functionals.
Math4AiMl 2025 - 3rd Workshop of UMI Group Mathematics for Artificial Intelligence and Machine Learning. Bari, Italy, Jan 29-31, 2025. To be published. Preprint available. arXiv

Abstract

In this paper, we consider functionals of the form Hα(u)=F(u)+αG(u) with α∈[0,+∞), where u varies in a set U≠∅ (without further structure). We first show that, excluding at most countably many values of α, we have that infH⋆αG=supH⋆αG, where H⋆α:=argminUHα, which is assumed to be non-empty. We further prove a stronger result that concerns the {invariance of the} limiting value of the functional G along minimizing sequences for Hα. This fact in turn implies an unexpected consequence for functionals regularized with uniformly convex norms: excluding again at most countably many values of α, it turns out that for a minimizing sequence, convergence to a minimizer in the weak or strong sense is equivalent.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Jona Klemenc

Applied Numerical Analysis

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[1520]

S. Gasperini.
Strategies Towards Reliable Scene Understanding for Autonomous Driving.
Dissertation 2025. URL

Abstract

null

MCML Authors

Stefano Gasperini

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[1519]

E. Garces Arias, M. Li, C. Heumann and M. Aßenmacher.
Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL

Abstract

Decoding strategies for large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Since LLMs produce probability distributions over the entire vocabulary, various decoding methods have been developed to transform these probabilities into coherent and fluent text, each with its own set of hyperparameters. In this study, we present a large-scale, comprehensive analysis of how hyperparameter selection affects text quality in open-ended text generation across multiple LLMs, datasets, and evaluation metrics. Through an extensive sensitivity analysis, we provide practical guidelines for hyperparameter tuning and demonstrate the substantial influence of these choices on text quality. Using three established datasets, spanning factual domains (e.g., news) and creative domains (e.g., fiction), we show that hyperparameter tuning significantly impacts generation quality, though its effects vary across models and tasks. We offer in-depth insights into these effects, supported by both human evaluations and a synthesis of widely-used automatic evaluation metrics.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1518]

R. Litschko, O. Kraus, V. Blaschke and B. Plank.
Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL

Abstract

A large amount of local and culture-specific knowledge (e.g., people, traditions, food) can only be found in documents written in dialects. While there has been extensive research conducted on cross-lingual information retrieval (CLIR), the field of cross-dialect retrieval (CDIR) has received limited attention. Dialect retrieval poses unique challenges due to the limited availability of resources to train retrieval models and the high variability in non-standardized languages. We study these challenges on the example of German dialects and introduce the first German dialect retrieval dataset, dubbed WikiDIR, which consists of seven German dialects extracted from Wikipedia. Using WikiDIR, we demonstrate the weakness of lexical methods in dealing with high lexical variation in dialects. We further show that commonly used zero-shot cross-lingual transfer approach with multilingual encoders do not transfer well to extremely low-resource setups, motivating the need for resource-lean and dialect-specific retrieval models. We finally demonstrate that (document) translation is an effective way to reduce the dialect gap in CDIR.

MCML Authors

Robert Litschko

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Verena Blaschke

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[1517]

Y. Liu, C. Ma, H. Ye and H. Schütze.
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL GitHub

Abstract

Transliterating related languages that use different scripts into a common script shows effectiveness in improving crosslingual transfer in downstream tasks. However, this methodology often makes pretraining a model from scratch unavoidable, as transliteration brings about new subwords not covered in existing multilingual pretrained language models (mPLMs). This is not desired because it takes a lot of computation budget for pretraining. A more promising way is to make full use of available mPLMs. To this end, this paper proposes a simple but effective framework: Transliterate-Merge-Initialize (TransMI), which can create a strong baseline well-suited for data that is transliterated into a common script by exploiting an mPLM and its accompanied tokenizer. TransMI has three stages: (a) transliterate the vocabulary of an mPLM into a common script; (b) merge the new vocabulary with the original vocabulary; and (c) initialize the embeddings of the new subwords. We applied TransMI to three recent strong mPLMs, and our experiments demonstrate that TransMI not only preserves their ability to handle non-transliterated data, but also enables the models to effectively process transliterated data: the results show a consistent improvement of 3% to 34%, varying across different models and tasks.

MCML Authors

Yihong Liu

Computational Linguistics

Chunlan Ma

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Haotian Ye

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1516]

Y. Liu, M. Wang, A. H. Kargaran, A. Imani, O. Xhelili, H. Ye, C. Ma, F. Yvon and H. Schütze.
How Transliterations Improve Crosslingual Alignment.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL

Abstract

Recent studies have shown that post-aligning multilingual pretrained language models (mPLMs) using alignment objectives on both original and transliterated data can improve crosslingual alignment. This improvement further leads to better crosslingual transfer performance. However, it remains unclear how and why a better crosslingual alignment is achieved, as this technique only involves transliterations, and does not use any parallel data. This paper attempts to explicitly evaluate the crosslingual alignment and identify the key elements in transliteration-based approaches that contribute to better performance. For this, we train multiple models under varying setups for two pairs of related languages: (1) Polish and Ukrainian and (2) Hindi and Urdu. To assess alignment, we define four types of similarities based on sentence representations. Our experiments show that adding transliterations alone improves the overall similarities, even for random sentence pairs. With the help of auxiliary alignment objectives, especially the contrastive objective, the model learns to distinguish matched from random pairs, leading to better alignments. However, we also show that better alignment does not always yield better downstream performance, suggesting that further research is needed to clarify the connection between alignment and performance.

MCML Authors

Yihong Liu

Computational Linguistics

Mingyang Wang

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Amir Hossein Kargaran

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Ayyoob Imani

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Haotian Ye

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Chunlan Ma

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computational Linguistics

[1515]

A. Muñoz-Ortiz, V. Blaschke and B. Plank.
Evaluating Pixel Language Models on Non-Standardized Languages.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL

Abstract

We explore the potential of pixel-based models for transfer learning from standard languages to dialects. These models convert text into images that are divided into patches, enabling a continuous vocabulary representation that proves especially useful for out-of-vocabulary words common in dialectal data. Using German as a case study, we compare the performance of pixel-based models to token-based models across various syntactic and semantic tasks. Our results show that pixel-based models outperform token-based models in part-of-speech tagging, dependency parsing and intent detection for zero-shot dialect evaluation by up to 26 percentage points in some scenarios, though not in Standard German. However, pixel-based models fall short in topic classification. These findings emphasize the potential of pixel-based models for handling dialectal data, though further research should be conducted to assess their effectiveness in various linguistic contexts.

MCML Authors

Verena Blaschke

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

AI and Computational Linguistics

[1514]

Y. Zhang, V. Hangya and A. Fraser.
LLM Sensitivity Challenges in Abusive Language Detection: Instruction-Tuned vs. Human Feedback.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL

Abstract

The capacity of large language models (LLMs) to understand and distinguish socially unacceptable texts enables them to play a promising role in abusive language detection. However, various factors can affect their sensitivity. In this work, we test whether LLMs have an unintended bias in abusive language detection, i.e., whether they predict more or less of a given abusive class than expected in zero-shot settings. Our results show that instruction-tuned LLMs tend to under-predict positive classes, since datasets used for tuning are dominated by the negative class. On the contrary, models fine-tuned with human feedback tend to be overly sensitive. In an exploratory approach to mitigate these issues, we show that label frequency in the prompt helps with the significant over-prediction.

MCML Authors

Viktor Hangya

Dr.

* Former Member

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Data Analytics & Statistics

[1513]

V. Blaschke, F. Körner and B. Plank.
Add Noise, Tasks, or Layers? MaiNLP at the VarDial 2025 Shared Task on Norwegian Dialectal Slot and Intent Detection.
VarDial @COLING 2025 - 12th Workshop on NLP for Similar Languages, Varieties and Dialects at the The 31st International Conference on Computational Linguistics (COLING 2025). Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL

Abstract

Slot and intent detection (SID) is a classic natural language understanding task. Despite this, research has only more recently begun focusing on SID for dialectal and colloquial varieties. Many approaches for low-resource scenarios have not yet been applied to dialectal SID data, or compared to each other on the same datasets. We participate in the VarDial 2025 shared task on slot and intent detection in Norwegian varieties, and compare multiple set-ups: varying the training data (English, Norwegian, or dialectal Norwegian), injecting character-level noise, training on auxiliary tasks, and applying Layer Swapping, a technique in which layers of models fine-tuned on different datasets are assembled into a model. We find noise injection to be beneficial while the effects of auxiliary tasks are mixed. Though some experimentation was required to successfully assemble a model from layers, it worked surprisingly well; a combination of models trained on English and small amounts of dialectal data produced the most robust slot predictions. Our best models achieve 97.6% intent accuracy and 85.6% slot F1 in the shared task.

MCML Authors

Verena Blaschke

AI and Computational Linguistics

Felicia Körner

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[1512]

X. Krückl, V. Blaschke and B. Plank.
Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study.
VarDial @COLING 2025 - 12th Workshop on NLP for Similar Languages, Varieties and Dialects at the The 31st International Conference on Computational Linguistics (COLING 2025). Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL

Abstract

Reliable slot and intent detection (SID) is crucial in natural language understanding for applications like digital assistants. Encoder-only transformer models fine-tuned on high-resource languages generally perform well on SID. However, they struggle with dialectal data, where no standardized form exists and training data is scarce and costly to produce. We explore zero-shot transfer learning for SID, focusing on multiple Bavarian dialects, for which we release a new dataset for the Munich dialect. We evaluate models trained on auxiliary tasks in Bavarian, and compare joint multi-task learning with intermediate-task training. We also compare three types of auxiliary tasks: token-level syntactic tasks, named entity recognition (NER), and language modelling. We find that the included auxiliary tasks have a more positive effect on slot filling than intent classification (with NER having the most positive effect), and that intermediate-task training yields more consistent performance gains. Our best-performing approach improves intent classification performance on Bavarian dialects by 5.1 and slot filling F1 by 8.4 percentage points.

MCML Authors

Verena Blaschke

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1511]

A.-M. Lutgen, A. Plum, C. Purschke and B. Plank.
Neural Text Normalization for Luxembourgish Using Real-Life Variation Data.
VarDial @COLING 2025 - 12th Workshop on NLP for Similar Languages, Varieties and Dialects at the The 31st International Conference on Computational Linguistics (COLING 2025). Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL

Abstract

Orthographic variation is very common in Luxembourgish texts due to the absence of a fully-fledged standard variety. Additionally, developing NLP tools for Luxembourgish is a difficult task given the lack of annotated and parallel data, which is exacerbated by ongoing standardization. In this paper, we propose the first sequence-to-sequence normalization models using the ByT5 and mT5 architectures with training data obtained from word-level real-life variation data. We perform a fine-grained, linguistically-motivated evaluation to test byte-based, word-based and pipeline-based models for their strengths and weaknesses in text normalization. We show that our sequence model using real-life variation data is an effective approach for tailor-made normalization in Luxembourgish.

MCML Authors

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1510]

A. Sanin, J. K. Flowers, T. H. Piotrowiak, F. Felsen, L. Merker, A. Ludwig, D. Bresser and H. S. Stein.
Integrating Automated Electrochemistry and High-Throughput Characterization with Machine Learning to Explore Si─Ge─Sn Thin-Film Lithium Battery Anodes.
Advanced Energy Materials Early Access.2404961 (Jan. 2025). DOI

Abstract

High-performance batteries need accelerated discovery and optimization of new anode materials. Herein, we explore the Si─Ge─Sn ternary alloy system as a candidate fast-charging anode materials system by utilizing a scanning droplet cell (SDC) as an autonomous electrochemical characterization tool with the goal of subsequent upscaling. As the SDC is performing experiments sequentially, an exploration of the entire ternary space is unfeasible due to time constraints. Thus, closed-loop optimization, guided by real-time data analysis and sequential learning algorithms, is utilized to direct experiments. The lead material identified is scaled up to a coin cell to validate the findings from the autonomous millimeter-scale thin-film electrochemical experimentation. Explainable machine learning (ML) models incorporating data from high-throughput Raman spectroscopy and X-ray diffraction (XRD) are used to elucidate the effect of short and long-range ordering on material performance.

MCML Authors

Helge Stein

Prof. Dr.

Digital Catalysis

[1509]

M. Abrahamowicz, M.-E. Beauchamp, A.-L. Boulesteix, T. P. Morris, W. Sauerbrei, J. S. Kaufman and o. b. o. t. STRATOS Simulation Panel.
Data-driven simulations to assess the impact of study imperfections in time-to-event analyses.
American Journal of Epidemiology 194.1 (Jan. 2025). DOI

Abstract

Quantitative bias analysis (QBA) permits assessment of the expected impact of various imperfections of the available data on the results and conclusions of a particular real-world study. This article extends QBA methodology to multivariable time-to-event analyses with right-censored endpoints, possibly including time-varying exposures or covariates. The proposed approach employs data-driven simulations, which preserve important features of the data at hand while offering flexibility in controlling the parameters and assumptions that may affect the results. First, the steps required to perform data-driven simulations are described, and then two examples of real-world time-to-event analyses illustrate their implementation and the insights they may offer. The first example focuses on the omission of an important time-invariant predictor of the outcome in a prognostic study of cancer mortality, and permits separating the expected impact of confounding bias from noncollapsibility. The second example assesses how imprecise timing of an interval-censored event—ascertained only at sparse times of clinic visits—affects its estimated association with a time-varying drug exposure. The simulation results also provide a basis for comparing the performance of two alternative strategies for imputing the unknown event times in this setting. The R scripts that permit the reproduction of our examples are provided.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[1508]

F. Bortolussi, H. Sandström, F. Partovi, J. Mikkilä, P. Rinke and M. Rissanen.
Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning.
Atmospheric Chemistry and Physics 25.1 (Jan. 2025). DOI

Abstract

Chemical ionization mass spectrometry (CIMS) is widely used in atmospheric chemistry studies. However, due to the complex interactions between reagent ions and target compounds, chemical understanding remains limited and compound identification difficult. In this study, we apply machine learning to a reference dataset of pesticides in two standard solutions to build a model that can provide insights from CIMS analyses in atmospheric science. The CIMS measurements were performed with an Orbitrap mass spectrometer coupled to a thermal desorption multi-scheme chemical ionization inlet unit (TD-MION-MS) with both negative and positive ionization modes utilizing Br−, , H3O+ and (CH3)2COH+ (AceH+) as reagent ions. We then trained two machine learning methods on these data: (1) random forest (RF) for classifying if a pesticide can be detected with CIMS and (2) kernel ridge regression (KRR) for predicting the expected CIMS signals. We compared their performance on five different representations of the molecular structure: the topological fingerprint (TopFP), the molecular access system keys (MACCS), a custom descriptor based on standard molecular properties (RDKitPROP), the Coulomb matrix (CM) and the many-body tensor representation (MBTR). The results indicate that MACCS outperforms the other descriptors. Our best classification model reaches a prediction accuracy of 0.85 ± 0.02 and a receiver operating characteristic curve area of 0.91 ± 0.01. Our best regression model reaches an accuracy of 0.44 ± 0.03 logarithmic units of the signal intensity. Subsequent feature importance analysis of the classifiers reveals that the most important sub-structures are NH and OH for the negative ionization schemes and nitrogen-containing groups for the positive ionization schemes.

MCML Authors

Patrick Rinke

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

AI-based Material Science

[1507]

L. Schneider.
Advancing hyperparameter optimization: foundations, multiple objectives and algorithmic innovations informed through benchmarking.
Dissertation 2025. DOI

Abstract

Hyperparameter optimization (HPO) is a fundamental aspect of machine learning (ML), directly influencing model performance and adaptability. As a computationally expensive black-box optimization problem, HPO requires efficient algorithms to identify optimal hyperparameter configurations. This thesis advances the field of HPO along three key dimensions: foundational insights, HPO in the presence of more than one objective, and algorithmic innovations through benchmarking. (Shortened.)

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

[1506]

L. K. Senel.
Exploring the frontiers of word understanding and language model evaluation in NLP.
Dissertation 2025. DOI

Abstract

The field of natural language processing (NLP) has progressed dramatically with the rise of deep learning, yet many challenges in learning high-quality semantic representations remain. This thesis addresses these challenges through a series of studies focusing on both monolingual and multilingual contexts. (Shortened.)

MCML Authors

Lütfi Kerem Senel

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

[1505]

S. Grosu, M. P. Fabritius, M. Winkelmann, D. Puhr-Westerheide, M. Ingenerf, S. Maurus, A. Graser, C. Schulz, T. Knösel, C. C. Cyran, J. Ricke, P. M. Kazmierczak, M. Ingrisch and P. Wesp.
Effect of artificial intelligence-aided differentiation of adenomatous and non-adenomatous colorectal polyps at CT colonography on radiologists’ therapy management.
European Radiology Early Access (Jan. 2025). DOI

Abstract

Objectives: Adenomatous colorectal polyps require endoscopic resection, as opposed to non-adenomatous hyperplastic colorectal polyps. This study aims to evaluate the effect of artificial intelligence (AI)-assisted differentiation of adenomatous and non-adenomatous colorectal polyps at CT colonography on radiologists’ therapy management.
Materials and methods: Five board-certified radiologists evaluated CT colonography images with colorectal polyps of all sizes and morphologies retrospectively and decided whether the depicted polyps required endoscopic resection. After a primary unassisted reading based on current guidelines, a second reading with access to the classification of a radiomics-based random-forest AI-model labelling each polyp as ’non-adenomatous’ or ‘adenomatous’ was performed. Performance was evaluated using polyp histopathology as the reference standard.
Results: 77 polyps in 59 patients comprising 118 polyp image series (47% supine position, 53% prone position) were evaluated unassisted and AI-assisted by five independent board-certified radiologists, resulting in a total of 1180 readings (subsequent polypectomy: yes or no). AI-assisted readings had higher accuracy (76% +/− 1% vs. 84% +/− 1%), sensitivity (78% +/− 6% vs. 85% +/− 1%), and specificity (73% +/− 8% vs. 82% +/− 2%) in selecting polyps eligible for polypectomy (p < 0.001). Inter-reader agreement was improved in the AI-assisted readings (Fleiss’ kappa 0.69 vs. 0.92).
Conclusion: AI-based characterisation of colorectal polyps at CT colonography as a second reader might enable a more precise selection of polyps eligible for subsequent endoscopic resection. However, further studies are needed to confirm this finding and histopathologic polyp evaluation is still mandatory.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Philipp Wesp

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

[1504]

L. Bothmann and K. Peters.
Fairness von KI – ein Brückenschlag zwischen Philosophie und Maschinellem Lernen.
Grenzen Künstlicher Intelligenz (Jan. 2025). DOI

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[1503]

M. Milling, S. D. Rampp, A. Triantafyllopoulos, M. P. Plaza, J. O. Brunner, C. Traidl-Hoffmann, B. W. Schuller and A. Damialis.
Automating airborne pollen classification: Identifying and interpreting hard samples for classifiers.
Heliyon 11.2 (Jan. 2025). DOI GitHub

Abstract

Deep-learning-based classification of pollen grains has been a major driver towards automatic monitoring of airborne pollen. Yet, despite an abundance of available datasets, little effort has been spent to investigate which aspects pose the biggest challenges to the (often black-box- resembling) pollen classification approaches. To shed some light on this issue, we conducted a sample-level difficulty analysis based on the likelihood for one of the largest automatically-generated datasets of pollen grains on microscopy images and investigated the reason for which certain airborne samples and specific pollen taxa pose particular problems to deep learning algorithms. It is here concluded that the main challenges lie in A) the (partly) co-occurring of multiple pollen grains in a single image, B) the occlusion of specific markers through the 2D capturing of microscopy images, and C) for some taxa, a general lack of salient, unique features.

MCML Authors

Manuel Milling

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1502]

F. Tian, H. Zhang, Y. Tan, L. Zhu, L. Shen, K. Qian, B. Hu, B. W. Schuller and Y. Yamamoto.
An On-Board Executable Multi-Feature Transfer-Enhanced Fusion Model for Three-Lead EEG Sensor-Assisted Depression Diagnosis.
IEEE Journal of Biomedical and Health Informatics 29.1 (Jan. 2025). DOI

Abstract

The development of affective computing and medical electronic technologies has led to the emergence of Artificial Intelligence (AI)-based methods for the early detection of depression. However, previous studies have often overlooked the necessity for the AI-assisted diagnosis system to be wearable and accessible in practical scenarios for depression recognition. In this work, we present an on-board executable multi-feature transfer-enhanced fusion model for our custom-designed wearable three-lead Electroencephalogram (EEG) sensor, based on EEG data collected from 73 depressed patients and 108 healthy controls. Experimental results show that the proposed model exhibits low-computational complexity (65.0 K parameters), promising Floating-Point Operations (FLOPs) performance (25.6 M), real-time processing (1.5 s/execution), and low power consumption (320.8 mW). Furthermore, it requires only 202.0 KB of Random Access Memory (RAM) and 279.6 KB of Read-Only Memory (ROM) when deployed on the EEG sensor. Despite its low computational and spatial complexity, the model achieves a notable classification accuracy of 95.2%, specificity of 94.0%, and sensitivity of 96.9% under independent test conditions. These results underscore the potential of deploying the model on the wearable three-lead EEG sensor for assisting in the diagnosis of depression.

MCML Authors

Björn Schuller

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Health Informatics

[1501]

J. Beck, L. M. Kemeter, K. Dürrbeck, M. H. I. Abdalla and F. Kreuter.
Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18 (Jan. 2025). DOI

Abstract

High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have
traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of Large Language Models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyses examine the annotation quality loss between the expert and other annotators. This comparison is conducted through (1) descriptive analyses, (2) fitting linear probability models, and (3) comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering and the task-specificity of expertise.

MCML Authors

Jacob Beck

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[1500]

A. Akman, Q. Sun and B. W. Schuller.
Improving Audio Explanations using Audio Language Models.
IEEE Signal Processing Letters Early Access (Jan. 2025). DOI

Abstract

Foundation models are widely utilised for their strong representational capabilities, driven by training on extensive datasets with self-supervised learning. The increasing complexity of these models highlights the importance of interpretability to enhance transparency and improve human understanding of their decision-making processes. Most existing interpretability methods explain model behaviour by attributing importance to individual data elements across different layers, based on their influence on the final prediction. These approaches often emphasise only the most relevant features, overlooking the broader representational space, removing less important features. In this study, we propose a novel framework for explanation generation that serves as an alternative to feature removal, offering a more comprehensive understanding of model behaviour. Our framework leverages the generative abilities of audio language models to replace removed features with contextually appropriate alternatives, providing a more complete view of the model’s decision-making process. Through extensive evaluations on standard benchmarks, including keyword spotting and speech emotion recognition, our approach demonstrates its effectiveness in generating high-quality audio explanations.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1499]

Y. Sun, Y. Zhou, X. Xu, J. Qi, F. Xu, Z. Ren and B. W. Schuller.
Weakly-Supervised Depression Detection in Speech Through Self-Learning Based Label Correction.
IEEE Transactions on Audio, Speech and Language Processing Early Access (Jan. 2025). DOI

Abstract

Automated Depression Detection (ADD) in speech aims to automatically estimate one’s depressive attributes through artificial intelligence tools towards spoken signals. Nevertheless, existing speech-based ADD works fail to sufficiently consider weakly-supervised cases with inaccurate labels, which may typically appear in intelligent mental health. In this regard, we propose the Self-Learning-based Label Correction (SLLC) approach for weakly-supervised depression detection in speech. The proposed approach employs a self-learning manner connecting a label correction module and a depression detection module. Within the approach, the label correction module fuses likelihood-ratio-based and prototype-based label correction strategies in order to effectively correct the inaccurate labels, while the depression detection module aims at detecting depressed samples through a 1D convolutional recurrent neural network with multiple types of losses. The experimental results on two depression detection corpora show that our proposed SLLC approach performs better compared with existing state-of-the-art speech-based depression detection approaches, in the case of weak supervision with inaccurate labels for depression detection in speech.

MCML Authors

Björn Schuller

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Health Informatics

[1498]

W. Huang, Z. Gu, Y. Shi, Z. Xiong and X. Zhu.
Semi-Supervised Building Footprint Extraction Using Debiased Pseudo-Labels.
IEEE Transactions on Geoscience and Remote Sensing 63 (Jan. 2025). DOI GitHub

Abstract

Accurate extraction of building footprints from satellite imagery is of high value. Currently, deep learning methods are predominant in this field due to their powerful representation capabilities. However, they generally require extensive pixel-wise annotations, which constrains their practical application. Semi-supervised learning (SSL) significantly mitigates this requirement by leveraging large volumes of unlabeled data for model self-training (ST), thus enhancing the viability of building footprint extraction. Despite its advantages, SSL faces a critical challenge: the imbalanced distribution between the majority background class and the minority building class, which often results in model bias toward the background during training. To address this issue, this article introduces a novel method called DeBiased matching (DBMatch) for semi-supervised building footprint extraction. DBMatch comprises three main components: 1) a basic supervised learning module (SUP) that uses labeled data for initial model training; 2) a classical weak-to-strong ST module that generates pseudo-labels from unlabeled data for further model ST; and 3) a novel logit debiasing (LDB) module that calculates a global logit bias between building and background, allowing for dynamic pseudo-label calibration. To verify the effectiveness of the proposed DBMatch, extensive experiments are performed on three public building footprint extraction datasets covering six global cities in SSL setting. The experimental results demonstrate that our method significantly outperforms some advanced SSL methods in semi-supervised building footprint extraction.

MCML Authors

Ziqi Gu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1497]

J. Li, T. Su, B. Zhao, F. Lv, Q. Wang, N. Navab, Y. Hu and Z. Jiang.
Ultrasound Report Generation With Cross-Modality Feature Alignment via Unsupervised Guidance.
IEEE Transactions on Medical Imaging 44.1 (Jan. 2025). DOI

Abstract

Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets.

MCML Authors

Jun Li

Computational Imaging and AI in Medicine

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1496]

F. Fan, Y. Shi, T. Guggemos and X. Zhu.
Hybrid Quantum Deep Learning With Superpixel Encoding for Earth Observation Data Classification.
IEEE Transactions on Neural Networks and Learning Systems Early Access (Jan. 2025). DOI URL

Abstract

Earth observation (EO) has inevitably entered the Big Data era. The computational challenge associated with analyzing large EO data using sophisticated deep learning models has become a significant bottleneck. To address this challenge, there has been a growing interest in exploring quantum computing as a potential solution. However, the process of encoding EO data into quantum states for analysis potentially undermines the efficiency advantages gained from quantum computing. This article introduces a hybrid quantum deep learning model that effectively encodes and analyzes EO data for classification tasks. The proposed model uses an efficient encoding approach called superpixel encoding, which reduces the quantum resources required for large image representation by incorporating the concept of superpixels. To validate the effectiveness of our model, we conducted evaluations on multiple EO benchmarks, including Overhead-MNIST, So2Sat LCZ42, and SAT-6 datasets. In addition, we studied the impacts of different interaction gates and measurements on classification performance to guide model optimization. The experimental results suggest the validity of our model for accurate classification of EO data.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1495]

W. Mayr, A. Triantafyllopoulos, A. Batliner, B. W. Schuller and T. M. Berghaus.
Assessing the Clinical and Functional Status of COPD Patients Using Speech Analysis During and After Exacerbation.
International Journal of Chronic Obstructive Pulmonary Disease 20 (Jan. 2025). DOI

Abstract

Background: Chronic obstructive pulmonary disease (COPD) affects breathing, speech production, and coughing. We evaluated a machine learning analysis of speech for classifying the disease severity of COPD.
Methods: In this single centre study, non-consecutive COPD patients were prospectively recruited for comparing their speech characteristics during and after an acute COPD exacerbation. We extracted a set of spectral, prosodic, and temporal variability features, which were used as input to a support vector machine (SVM). Our baseline for predicting patient state was an SVM model using self-reported BORG and COPD Assessment Test (CAT) scores.
Results: In 50 COPD patients (52% males, 22% GOLD II, 44% GOLD III, 32% GOLD IV, all patients group E), speech analysis was superior in distinguishing during and after exacerbation status compared to BORG and CAT scores alone by achieving 84% accuracy in prediction. CAT scores correlated with reading rhythm, and BORG scales with stability in articulation. Pulmonary function testing (PFT) correlated with speech pause rate and speech rhythm variability.
Conclusion: Speech analysis may be a viable technology for classifying COPD status, opening up new opportunities for remote disease monitoring.

MCML Authors

Andreas Triantafyllopoulos

Health Informatics

Anton Batliner

Dr.

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1494]

N. Heldring, A.-R. Rezaie, A. Larsson, R. Gahn, B. Zilg, S. Camilleri, A. Saade, P. Wesp, E. Palm and O. Kvist.
A probability model for estimating age in young individuals relative to key legal thresholds: 15, 18 or 21-year.
International Journal of Legal Medicine 139.1 (Jan. 2025). DOI

Abstract

Age estimations are relevant for pre-trial detention, sentencing in criminal cases and as part of the evaluation in asylum processes to protect the rights and privileges of minors. No current method can determine an exact chronological age due to individual variations in biological development. This study seeks to develop a validated statistical model for estimating an age relative to key legal thresholds (15, 18, and 21 years) based on a skeletal (CT-clavicle, radiography-hand/wrist or MR-knee) and tooth (radiography-third molar) developmental stages. The whole model is based on 34 scientific studies, divided into examinations of the hand/wrist (15 studies), clavicle (5 studies), distal femur (4 studies), and third molars (10 studies). In total, data from approximately 27,000 individuals have been incorporated and the model has subsequently been validated with data from 5,000 individuals. The core framework of the model is built upon transition analysis and is further developed by a combination of a type of parametric bootstrapping and Bayesian theory. Validation of the model includes testing the models on independent datasets of individuals with known ages and shows a high precision with separate populations aligning closely with the model’s predictions. The practical use of the complex statistical model requires a user-friendly tool to provide probabilities together with the margin of error. The assessment based on the model forms the medical component for the overall evaluation of an individual’s age.

MCML Authors

Philipp Wesp

Dr.

Clinical Data Science in Radiology

[1493]

B. Lange.
Moral parenthood and gestation: replies to Cordeiro, Murphy, Robinson and Baron.
Journal of Medical Ethics 51.2 (Jan. 2025). DOI

Abstract

I am grateful to James Cordeiro, Timothy Murphy, Heloise Robinson and Teresa Baron for their perceptive and stimulating comments on my article in this journal. In what follows, I seek to respond to some of the main points raised in each commentary.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[1492]

B. Lange.
Moral parenthood: not gestational.
Journal of Medical Ethics 51.2 (Jan. 2025). DOI

Abstract

Parenting our biological children is a centrally important matter, but how, if it all, can it be justified? According to a contemporary influential line of thinking, the acquisition by parents of a moral right to parent their biological children should be grounded by appeal to the value of the intimate emotional relationship that gestation facilitates between a newborn and a gestational procreator. I evaluate two arguments in defence of this proposal and argue that both are unconvincing.Data are available in a public, open access repository.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[1491]

R. Dorent, R. Khajavi, T. Idris, E. Ziegler, B. Somarouthu, H. Jacene, A. LaCasce, J. Deissler, J. Ehrhardt, S. Engelson, S. Fischer, Y. Gu, H. Handels, S. Kasai, S. Kondo, K. Maier-Hein, J. A. Schnabel, G. Wang, L. Wang, T. Wald, G.-Z. Yang, H. Zhang, M. Zhang, S. Pieper, G. Harris, R. Kikinis and T. Kapur.
LNQ 2023 challenge: Benchmark of weakly-supervised techniques for mediastinal lymph node quantification.
Machine Learning for Biomedical Imaging 3.Special Issue (Jan. 2025). DOI GitHub

Abstract

Accurate assessment of lymph node size in 3D CT scans is crucial for cancer staging, therapeutic management, and monitoring treatment response. Existing state-of-the-art segmentation frameworks in medical imaging often rely on fully annotated datasets. However, for lymph node segmentation, these datasets are typically small due to the extensive time and expertise required to annotate the numerous lymph nodes in 3D CT scans. Weakly-supervised learning, which leverages incomplete or noisy annotations, has recently gained interest in the medical imaging community as a potential solution. Despite the variety of weakly-supervised techniques proposed, most have been validated only on private datasets or small publicly available datasets. To address this limitation, the Mediastinal Lymph Node Quantification (LNQ) challenge was organized in conjunction with the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). This challenge aimed to advance weakly-supervised segmentation methods by providing a new, partially annotated dataset and a robust evaluation framework. A total of 16 teams from 5 countries submitted predictions to the validation leaderboard, and 6 teams from 3 countries participated in the evaluation phase. The results highlighted both the potential and the current limitations of weakly-supervised approaches. On one hand, weakly-supervised approaches obtained relatively good performance with a median Dice score of 61.0%. On the other hand, top-ranked teams, with a median Dice score exceeding 70%, boosted their performance by leveraging smaller but fully annotated datasets to combine weak supervision and full supervision. This highlights both the promise of weakly-supervised methods and the ongoing need for high-quality, fully annotated data to achieve higher segmentation performance.

MCML Authors

Stefan Fischer

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1490]

E. Eulig, F. Jäger, J. Maier, B. Ommer and M. Kachelrieß.
Reconstructing and analyzing the invariances of low-dose CT image denoising networks.
Medical Physics 52 (Jan. 2025). DOI

Abstract

Background: Deep learning-based methods led to significant advancements in many areas of medical imaging, most of which are concerned with the reduction of artifacts caused by motion, scatter, or noise. However, with most neural networks being black boxes, they remain notoriously difficult to interpret, hindering their clinical implementation. In particular, it has been shown that networks exhibit invariances w.r.t. input features, that is, they learn to ignore certain information in the input data.
Purpose: To improve the interpretability of deep learning-based low-dose CT image denoising networks.
Methods: We learn a complete data representation of low-dose input images using a conditional variational autoencoder (cVAE). In this representation, invariances of any given denoising network are then disentangled from the information it is not invariant to using a conditional invertible neural network (cINN). At test time, image-space invariances are generated by applying the inverse of the cINN and subsequent decoding using the cVAE. We propose two methods to analyze sampled invariances and to find those that correspond to alterations of anatomical structures.
Results: The proposed method is applied to four popular deep learning-based low-dose CT image denoising networks. We find that the networks are not only invariant to noise amplitude and realizations, but also to anatomical structures.
Conclusions: The proposed method is capable of reconstructing and analyzing invariances of deep learning-based low-dose CT image denoising networks. This is an important step toward interpreting deep learning-based methods for medical imaging, which is essential for their clinical implementation.

MCML Authors

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1489]

E. Achterhold, M. Mühlböck, N. Steiber and C. Kern.
Fairness in Algorithmic Profiling: The AMAS Case.
Minds and Machines 35.9 (Jan. 2025). DOI

Abstract

We study a controversial application of algorithmic profiling in the public sector, the Austrian AMAS system. AMAS was supposed to help caseworkers at the Public Employment Service (PES) Austria to allocate support measures to job seekers based on their predicted chance of (re-)integration into the labor market. Shortly after its release, AMAS was criticized for its apparent unequal treatment of job seekers based on gender and citizenship. We systematically investigate the AMAS model using a novel real-world dataset of young job seekers from Vienna, which allows us to provide the first empirical evaluation of the AMAS model with a focus on fairness measures. We further apply bias mitigation strategies to study their effectiveness in our real-world setting. Our findings indicate that the prediction performance of the AMAS model is insufficient for use in practice, as more than 30% of job seekers would be misclassified in our use case. Further, our results confirm that the original model is biased with respect to gender as it tends to (incorrectly) assign women to the group with high chances of re-employment, which is not prioritized in the PES’ allocation of support measures. However, most bias mitigation strategies were able to improve fairness without compromising performance and thus may form an important building block in revising profiling schemes in the present context.

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[1488]

T. Li, S. Hofer, G. Moholdt, A. Igneczi, K. Heidler, X. Zhu and J. Bamber.
Pervasive glacier retreats across Svalbard from 1985 to 2023.
Nature Communications 16.705 (Jan. 2025). DOI

Abstract

A major uncertainty in predicting the behaviour of marine-terminating glaciers is ice dynamics driven by non-linear calving front retreat, which is poorly understood and modelled. Using 124919 calving front positions for 149 marine-terminating glaciers in Svalbard from 1985 to 2023, generated with deep learning, we identify pervasive calving front retreats for non-surging glaciers over the past 38 years. We observe widespread seasonal cycles in calving front position for over half of the glaciers. At the seasonal timescale, peak retreat rates exhibit a several-month phase lag, with changes on the west coast occurring before those on the east coast, coincident with regional ocean warming. This spatial variability in seasonal patterns is linked to different timings of warm ocean water inflow from the West Spitsbergen Current, demonstrating the dominant role of ice-ocean interaction in seasonal front changes. The interannual variability of calving front retreat shows a strong sensitivity to both atmospheric and oceanic warming, with immediate responses to large air and ocean temperature anomalies in 2016 and 2019, likely driven by atmospheric blocking that can influence extreme temperature variability. With more frequent blocking occurring and continued regional warming, future calving front retreats will likely intensify, leading to more significant glacier mass loss.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1487]

S. Feuerriegel, A. Maarouf, D. Bär, D. Geißler, J. Schweisthal, N. Pröllochs, C. E. Robertson, S. Rathje, J. Hartmann, S. M. Mohammad, O. Netzer, A. A. Siegel, B. Plank and J. J. Van Bavel.
Using natural language processing to analyse text data in behavioural science.
Nature Reviews Psychology (Jan. 2025). DOI

Abstract

Language is a uniquely human trait at the core of human interactions. The language people use often reflects their personality, intentions and state of mind. With the integration of the Internet and social media into everyday life, much of human communication is documented as written text. These online forms of communication (for example, blogs, reviews, social media posts and emails) provide a window into human behaviour and therefore present abundant research opportunities for behavioural science. In this Review, we describe how natural language processing (NLP) can be used to analyse text data in behavioural science. First, we review applications of text data in behavioural science. Second, we describe the NLP pipeline and explain the underlying modelling approaches (for example, dictionary-based approaches and large language models). We discuss the advantages and disadvantages of these methods for behavioural science, in particular with respect to the trade-off between interpretability and accuracy. Finally, we provide actionable recommendations for using NLP to ensure rigour and reproducibility.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Abdurahman Maarouf

Artificial Intelligence in Management

Dominik Bär

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1486]

B. Lange.
Digital Duplicates and Collective Scarcity.
Philosophy and Technology 38.7 (Jan. 2025). DOI

Abstract

Digital duplicates reduce the scarcity of individuals and thus may impact their instrumental and intrinsic value. I here expand upon this idea by introducing the notion of collective scarcity, which pertains to the limitations faced by social groups in maintaining their size, cohesion and function.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[1485]

M. Binz, S. Alaniz, A. Roskies, B. , C. T. Bergstrom, C. Allen, D. Schad, D. Wulff, J. D. , Q. Zhang, R. M. Shiffrin, S. J. Gershman, V. Popov, E. M. Bender, M. Marelli, M. M. Botvinick, Z. Akata and E. Schulz.
How should the advancement of large language models affect the practice of science?
Proceedings of the National Academy of Sciences 122.5 (Jan. 2025). DOI

Abstract

Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advancement of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schulz et al. make the argument that working with LLMs is not fundamentally different from working with human collaborators, while Bender et al. argue that LLMs are often misused and overhyped, and that their limitations warrant a focus on more specialized, easily interpretable tools. Marelli et al. emphasize the importance of transparent attribution and responsible use of LLMs. Finally, Botvinick and Gershman advocate that humans should retain responsibility for determining the scientific roadmap. To facilitate the discussion, the four perspectives are complemented with a response from each group. By putting these different perspectives in conversation, we aim to bring attention to important considerations within the academic community regarding the adoption of LLMs and their impact on both current and future scientific practices.

MCML Authors

Stephan Alaniz

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Interpretable and Reliable Machine Learning

[1484]

T. Weber, J. Dexl, D. Rügamer and M. Ingrisch.
Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition.
Radiology: Artificial Intelligence 7.2 (Jan. 2025). DOI

Abstract

We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decomposition to the convolutional kernels of the TotalSegmentator (TS) model, an nnU-Net model trained on a comprehensive dataset for automatic segmentation of 117 anatomical structures. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality. This study utilized the publicly available TS dataset, employing various downsampling factors to explore the relationship between model size, inference speed, and segmentation performance. The application of Tucker decomposition to the TS model substantially reduced the model parameters and FLOPs across various compression rates, with limited loss in segmentation accuracy. We removed up to 88% of the model’s parameters with no significant performance changes in the majority of classes after fine-tuning. Practical benefits varied across different graphics processing unit (GPU) architectures, with more distinct speed-ups on less powerful hardware. Post-hoc network compression via Tucker decomposition presents a viable strategy for reducing the computational demand of medical image segmentation models without substantially sacrificing accuracy. This approach enables the broader adoption of advanced deep learning technologies in clinical practice, offering a way to navigate the constraints of hardware capabilities.

MCML Authors

Tobias Weber

* Former Member

Jakob Dexl

Clinical Data Science in Radiology

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Michael Ingrisch

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Clinical Data Science in Radiology

[1483]

A. Scagliotti.
Minimax Problems for Ensembles of Control-Affine Systems.
SIAM Journal on Control and Optimization 63.1 (Jan. 2025). DOI

Abstract

In this paper, we consider ensembles of control-affine systems in ℝd, and we study simultaneous optimal control problems related to the worst-case minimization. After proving that such problems admit solutions, denoting with (ΘN)N a sequence of compact sets that parametrize the ensembles of systems, we first show that the corresponding minimax optimal control problems are Γ-convergent whenever (ΘN)N has a limit with respect to the Hausdorff distance. Besides its independent interest, the previous result plays a crucial role for establishing the Pontryagin Maximum Principle (PMP) when the ensemble is parametrized by a set Θ consisting of infinitely many points. Namely, we first approximate Θ by finite and increasing-in-size sets (ΘN)N for which the PMP is known, and then we derive the PMP for the Γ-limiting problem. The same strategy can be pursued in applications, where we can reduce infinite ensembles to finite ones to compute the minimizers numerically. We bring as a numerical example the Schrödinger equation for a qubit with uncertain resonance frequency.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[1482]

M. Gorski, S. Wiegrebe, R. Burkhardt, M. Behr, H. Küchenhoff, K. J. Stark, C. A. Böger and I. M. Heid.
Bias-corrected serum creatinine from UK Biobank electronic medical records generates an important data resource for kidney function trajectories.
Scientific Reports 15.3540 (Jan. 2025). DOI

Abstract

Loss of kidney function is a substantial personal and public health burden. Kidney function is typically assessed as estimated glomerular filtration rate (eGFR) based on serum creatinine. UK Biobank provides serum creatinine measurements from study center assessments (SC, n = 425,147 baseline, n = 15,314 with follow-up) and emerging electronic Medical Records (eMR, ‘GP-clinical’) present a promising resource to augment this data longitudinally. However, it is unclear whether eMR-based and SC-based creatinine values can be used jointly for research on eGFR decline. When comparing eMR-based with SC-based creatinine by calendar year (n = 70,231), we found a year-specific multiplicative bias for eMR-based creatinine that decreased over time (factor 0.84 for 2007, 0.97 for 2013). Deriving eGFR based on SC- and bias-corrected eMR-creatinine yielded 454,907 individuals with ≥ 1eGFR assessment (2,102,174 assessments). This included 206,063 individuals with ≥ 2 assessments over up to 60.2 years (median 6.00 assessments, median time = 8.7 years), where we also obtained eMR-based information on kidney disease or renal replacement therapy. We found an annual eGFR decline of 0.11 (95%-CI = 0.10–0.12) versus 1.04 mL/min/1.73m2/year (9%-CI = 1.03–1.05) without and with bias-correction, the latter being in line with literature. In summary, our bias-corrected eMR-based creatinine values enabled a 4-fold increased number of eGFR assessments in UK Biobank suitable for kidney function research.

MCML Authors

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[1481]

K. Ghosh, M. Todorović, A. Vehtari and P. Rinke.
Active learning of molecular data for task-specific objectives.
The Journal of Chemical Physics 162.014103 (Jan. 2025). DOI

Abstract

Active learning (AL) has shown promise to be a particularly data-efficient machine learning approach. Yet, its performance depends on the application, and it is not clear when AL practitioners can expect computational savings. Here, we carry out a systematic AL performance assessment for three diverse molecular datasets and two common scientific tasks: compiling compact, informative datasets and targeted molecular searches. We implemented AL with Gaussian processes (GP) and used the many-body tensor as molecular representation. For the first task, we tested different data acquisition strategies, batch sizes, and GP noise settings. AL was insensitive to the acquisition batch size, and we observed the best AL performance for the acquisition strategy that combines uncertainty reduction with clustering to promote diversity. However, for optimal GP noise settings, AL did not outperform the randomized selection of data points. Conversely, for targeted searches, AL outperformed random sampling and achieved data savings of up to 64%. Our analysis provides insight into this task-specific performance difference in terms of target distributions and data collection strategies. We established that the performance of AL depends on the relative distribution of the target molecules in comparison to the total dataset distribution, with the largest computational savings achieved when their overlap is minimal.

MCML Authors

Patrick Rinke

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Daniel Grün

AI-based Material Science

[1480]

J. Homer and O. Friedrich.
SBIAX: Density-estimation simulation-based inference in JAX.
The Journal of Open Source Software 10.105 (Jan. 2025). DOI

Abstract

In a typical Bayesian inference problem, the data likelihood is not known. However, in recent
years, machine learning methods for density estimation can allow for inference using an estimator
of the data likelihood. This likelihood estimator is fit with neural networks that are trained on
simulations to maximise the likelihood of the simulation-parameter pairs - one of the many
available tools for Simulation Based Inference (SBI), (Cranmer et al., 2020)…

MCML Authors

Jed Homer

Astrophysics, Cosmology and Artificial Intelligence

[1479]

A. Köksal, M. Thaler, A. Imani, A. Üstün, A. Korhonen and H. Schütze.
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions.
Transactions of the Association for Computational Linguistics (2025). To be published. Preprint available. arXiv GitHub

Abstract

Instruction tuning enhances large language models (LLMs) by aligning them with human preferences across diverse tasks. Traditional approaches to create instruction tuning datasets face serious challenges for low-resource languages due to their dependence on data annotation. This work introduces a novel method, Multilingual Reverse Instructions (MURI), which generates high-quality instruction tuning datasets for low-resource languages without requiring human annotators or pre-existing multilingual models. Utilizing reverse instructions and a translation pipeline, MURI produces instruction-output pairs from existing human-written texts in low-resource languages. This method ensures cultural relevance and diversity by sourcing texts from different native domains and applying filters to eliminate inappropriate content. Our dataset, MURI-IT, includes more than 2 million instruction-output pairs across 200 languages. Evaluation by native speakers and fine-tuning experiments with mT5 models demonstrate the approach’s effectiveness for both NLU and open-ended generation.

MCML Authors

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Ayyoob Imani

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Computational Linguistics

[1478]

M. Aleksic, T. Ehring, A. Kunze, Y. Han, H. Funk and L. Wolkenstein.
Selective Effects of Eye Movement Desensitization and Reprocessing, Imagery Rescripting and Imaginal Exposure on Voluntary and Involuntary Memory of an Aversive Autobiographical Event.
Preprint (Jan. 2025). DOI

Abstract

Clinical theories suggest that trauma-focused interventions reduce intrusive memories while preserving voluntary recall. However, concerns persist that they may inadvertently compromise factual memory content. To test these contrasting predictions, we examined the effects of Eye Movement Desensitization and Reprocessing (EMDR), Imagery Rescripting (ImRs), Imaginal Exposure (IE), on involuntary and voluntary memories of an aversive autobiographical event. Healthy participants (N = 182), recruited between 2021 and 2023, completed a free recall task before receiving either one of the interventions or no intervention (NIC). One week later, the recall task was repeated. Intrusion load and frequency were assessed with an app-diary; psychophysiological responses to intrusions were assessed in a laboratory task. Independent raters evaluated disorganization, coherence, consistency of voluntary memory. All interventions reduced intrusion load, but only ImRs decreased intrusion frequency compared to NIC. Psychophysiological responses to intrusions showed no group differences. IE improved the structural organization of voluntary memory by reducing disorganized thoughts, while EMDR and ImRs enhanced conceptual organization by increasing contextual coherence. None of the interventions impaired memory consistency, with no group differences in contradictions or omissions. These findings suggest that these interventions reduce distressing intrusions without compromising voluntary memory. Further research should replicate these effects in clinical samples.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

[1477]

F. Drexel, V. Sideri-Lampretsa, H. Bast, A. W. Marka, T. Koehler, F. T. Gassert, D. Pfeiffer, D. Rückert and F. Pfeiffer.
Deformable Image Registration of Dark-Field Chest Radiographs for Local Lung Signal Change Assessment.
Preprint (Jan. 2025). arXiv

Abstract

Dark-field radiography of the human chest has been demonstrated to have promising potential for the analysis of the lung microstructure and the diagnosis of respiratory diseases. However, previous studies of dark-field chest radiographs evaluated the lung signal only in the inspiratory breathing state. Our work aims to add a new perspective to these previous assessments by locally comparing dark-field lung information between different respiratory states. To this end, we discuss suitable image registration methods for dark-field chest radiographs to enable consistent spatial alignment of the lung in distinct breathing states. Utilizing full inspiration and expiration scans from a clinical chronic obstructive pulmonary disease study, we assess the performance of the proposed registration framework and outline applicable evaluation approaches. Our regional characterization of lung dark-field signal changes between the breathing states provides a proof-of-principle that dynamic radiography-based lung function assessment approaches may benefit from considering registered dark-field images in addition to standard plain chest radiographs.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1476]

F. Dülmer, M. F. Azampour and N. Navab.
UltraRay: Full-Path Ray Tracing for Enhancing Realism in Ultrasound Simulation.
Preprint (Jan. 2025). arXiv

Abstract

Traditional ultrasound simulators solve the wave equation to model pressure distribution fields, achieving high accuracy but requiring significant computational time and resources. To address this, ray tracing approaches have been introduced, modeling wave propagation as rays interacting with boundaries and scatterers. However, existing models simplify ray propagation, generating echoes at interaction points without considering return paths to the sensor. This can result in unrealistic artifacts and necessitates careful scene tuning for plausible results. We propose a novel ultrasound simulation pipeline that utilizes a ray tracing algorithm to generate echo data, tracing each ray from the transducer through the scene and back to the sensor. To replicate advanced ultrasound imaging, we introduce a ray emission scheme optimized for plane wave imaging, incorporating delay and steering capabilities. Furthermore, we integrate a standard signal processing pipeline to simulate end-to-end ultrasound image formation. We showcase the efficacy of the proposed pipeline by modeling synthetic scenes featuring highly reflective objects, such as bones. In doing so, our proposed approach, UltraRay, not only enhances the overall visual quality but also improves the realism of the simulated images by accurately capturing secondary reflections and reducing unnatural artifacts. By building on top of a differentiable framework, the proposed pipeline lays the groundwork for a fast and differentiable ultrasound simulation tool necessary for gradient-based optimization, enabling advanced ultrasound beamforming strategies, neural network integration, and accurate inverse scene reconstruction.

MCML Authors

Felix Dülmer

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Computer Aided Medical Procedures & Augmented Reality

[1475]

S. Eckman, B. Ma, C. Kern, R. Chew, B. Plank and F. Kreuter.
Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR).
Preprint (Jan. 2025). arXiv

Abstract

Models trained on crowdsourced labels may not reflect broader population views when annotator pools are not representative. Since collecting representative labels is challenging, we propose Population-Aligned Instance Replication (PAIR), a method to address this bias through statistical adjustment. Using a simulation study of hate speech and offensive language detection, we create two types of annotators with different labeling tendencies and generate datasets with varying proportions of the types. Models trained on unbalanced annotator pools show poor calibration compared to those trained on representative data. However, PAIR, which duplicates labels from underrepresented annotator groups to match population proportions, significantly reduces bias without requiring new data collection. These results suggest statistical techniques from survey research can help align model training with target populations even when representative annotator pools are unavailable. We conclude with three practical recommendations for improving training data quality.

MCML Authors

Bolei Ma

Social Data Science and AI

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[1474]

Y. Feng, S. Feuerriegel and Y. R. Shrestha.
Contextualizing Recommendation Explanations with LLMs: A User Study.
Preprint (Jan. 2025). arXiv

Abstract

Large language models (LLMs) are increasingly prevalent in recommender systems, where LLMs can be used to generate personalized recommendations. Here, we examine how different LLM-generated explanations for movie recommendations affect users’ perceptions of cognitive, affective, and utilitarian needs and consumption intentions. In a pre-registered, between-subject online experiment (N=759) and follow-up interviews (N=30), we compare (a) LLM-generated generic explanations, and (b) LLM-generated contextualized explanations. Our findings show that contextualized explanations (i.e., explanations that incorporate users’ past behaviors) effectively meet users’ cognitive needs while increasing users’ intentions to watch recommended movies. However, adding explanations offers limited benefits in meeting users’ utilitarian and affective needs, raising concerns about the proper design and implications of LLM-generated explanations. Qualitative insights from interviews reveal that referencing users’ past preferences enhances trust and understanding but can feel excessive if overused. Furthermore, users with more active and positive engagement with the recommender system and movie-watching get substantial gains from contextualized explanations. Overall, our research clarifies how LLM-generated recommendations influence users’ motivations and behaviors, providing valuable insights for the future development of user-centric recommender systems, a key element in social media platforms and online ecosystems.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1473]

Z. Haouari, J. Weidner, I. Ezhov, A. Varma, D. Rückert, B. Menze and B. Wiestler.
Efficient Deep Learning-based Forward Solvers for Brain Tumor Growth Models.
Preprint (Jan. 2025). arXiv

Abstract

Glioblastoma, a highly aggressive brain tumor, poses major challenges due to its poor prognosis and high morbidity rates. Partial differential equation-based models offer promising potential to enhance therapeutic outcomes by simulating patient-specific tumor behavior for improved radiotherapy planning. However, model calibration remains a bottleneck due to the high computational demands of optimization methods like Monte Carlo sampling and evolutionary algorithms. To address this, we recently introduced an approach leveraging a neural forward solver with gradient-based optimization to significantly reduce calibration time. This approach requires a highly accurate and fully differentiable forward model. We investigate multiple architectures, including (i) an enhanced TumorSurrogate, (ii) a modified nnU-Net, and (iii) a 3D Vision Transformer (ViT). The optimized TumorSurrogate achieved the best overall results, excelling in both tumor outline matching and voxel-level prediction of tumor cell concentration. It halved the MSE relative to the baseline model and achieved the highest Dice score across all tumor cell concentration thresholds. Our study demonstrates significant enhancement in forward solver performance and outlines important future research directions.

MCML Authors

Jonas Weidner

AI for Image-Guided Diagnosis and Therapy

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

[1472]

B. Jian, J. Pan, Y. Li, F. Bongratz, R. Li, D. Rückert, B. Wiestler and C. Wachinger.
TimeFlow: Longitudinal Brain Image Registration and Aging Progression Analysis.
Preprint (Jan. 2025). arXiv

Abstract

Predicting future brain states is crucial for understanding healthy aging and neurodegenerative diseases. Longitudinal brain MRI registration, a cornerstone for such analyses, has long been limited by its inability to forecast future developments, reliance on extensive, dense longitudinal data, and the need to balance registration accuracy with temporal smoothness. In this work, we present emph{TimeFlow}, a novel framework for longitudinal brain MRI registration that overcomes all these challenges. Leveraging a U-Net architecture with temporal conditioning inspired by diffusion models, TimeFlow enables accurate longitudinal registration and facilitates prospective analyses through future image prediction. Unlike traditional methods that depend on explicit smoothness regularizers and dense sequential data, TimeFlow achieves temporal consistency and continuity without these constraints. Experimental results highlight its superior performance in both future timepoint prediction and registration accuracy compared to state-of-the-art methods. Additionally, TimeFlow supports novel biological brain aging analyses, effectively differentiating neurodegenerative conditions from healthy aging. It eliminates the need for segmentation, thereby avoiding the challenges of non-trivial annotation and inconsistent segmentation errors. TimeFlow paves the way for accurate, data-efficient, and annotation-free prospective analyses of brain aging and chronic diseases.

MCML Authors

Bailiang Jian

Artificial Intelligence in Medical Imaging

Yitong Li

Artificial Intelligence in Medical Imaging

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Christian Wachinger

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Medical Imaging

[1471]

O. Kononykhina, M. Schierholz and F. Kreuter.
The Impact of Question Framing on the Precision of Automatic Occupation Coding.
Preprint (Jan. 2025). arXiv

Abstract

Occupational data play a vital role in research, official statistics, and policymaking, yet their collection and accurate classification remain a persistent challenge. This study investigates the effects of occupational question wording on data variability and the performance of automatic coding tools. Through a series of survey experiments conducted and replicated in Germany, we tested two widely-used occupational question formats: one focusing on ‘job title’ (Berufsbezeichnung) and another on ‘occupational tasks’ (berufliche Tätigkeit). Our analysis reveals that automatic coding tools, such as CASCOT and OccuCoDe, exhibit significant sensitivity to the form and origin of the data. Specifically, these tools performed more efficiently when coding responses to the job title question format compared to the occupational task format. Additionally, we found that including examples of main tasks and duties in the questions led respondents to provide more detailed but less linguistically diverse responses. This reduced diversity may negatively affect the precision of automatic coding. These findings highlight the importance of tailoring automatic coding tools to the specific structure and origin of the data they are applied to. We emphasize the need for further research to optimize question design and coding tools for greater accuracy and applicability in occupational data collection.

MCML Authors

Olga Kononykhina

Social Data Science and AI

Malte Schierholz

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[1470]

T. Liu, X. Yu, W. Zhou, J. Gu and V. Tresp.
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings.
Preprint (Jan. 2025). arXiv

Abstract

Efficient preference optimization algorithms such as Direct Preference Optimization (DPO) have become a popular approach in aligning large language models (LLMs) with human preferences. These algorithms implicitly treat the LLM as a reward model, and focus on training it to correct misranked preference pairs. However, recent work~citep{chen2024preference} empirically finds that DPO training textit{rarely improves these misranked preference pairs}, despite its gradient emphasizing on these cases. We introduce FocalPO, a DPO variant that instead textit{down-weighs} misranked preference pairs and prioritizes enhancing the model’s understanding of pairs that it can already rank correctly. Inspired by Focal Loss used in vision tasks, FocalPO achieves this by adding a modulating factor to dynamically scale DPO loss. Our experiment demonstrates that FocalPO surpasses DPO and its variants on popular benchmarks like Alpaca Eval 2.0 using Mistral-Base-7B and Llama-3-Instruct-8B. Additionally, we empirically reveals how FocalPO affects training on correct and incorrect sample groups, further underscoring its effectiveness.

MCML Authors

Tong Liu

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1469]

T. Mortier, A. Javanmardi, Y. Sale, E. Hüllermeier and W. Waegeman.
Conformal Prediction in Hierarchical Classification.
Preprint (Jan. 2025). arXiv

Abstract

Conformal prediction has emerged as a widely used framework for constructing valid prediction sets in classification and regression tasks. In this work, we extend the split conformal prediction framework to hierarchical classification, where prediction sets are commonly restricted to internal nodes of a predefined hierarchy, and propose two computationally efficient inference algorithms. The first algorithm returns internal nodes as prediction sets, while the second relaxes this restriction, using the notion of representation complexity, yielding a more general and combinatorial inference problem, but smaller set sizes. Empirical evaluations on several benchmark datasets demonstrate the effectiveness of the proposed algorithms in achieving nominal coverage.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1468]

A. Saroha, F. Hofherr, M. Gladkova, C. Curreli, O. Litany and D. Cremers.
ZDySS -- Zero-Shot Dynamic Scene Stylization using Gaussian Splatting.
Preprint (Jan. 2025). arXiv

Abstract

Stylizing a dynamic scene based on an exemplar image is critical for various real-world applications, including gaming, filmmaking, and augmented and virtual reality. However, achieving consistent stylization across both spatial and temporal dimensions remains a significant challenge. Most existing methods are designed for static scenes and often require an optimization process for each style image, limiting their adaptability. We introduce ZDySS, a zero-shot stylization framework for dynamic scenes, allowing our model to generalize to previously unseen style images at inference. Our approach employs Gaussian splatting for scene representation, linking each Gaussian to a learned feature vector that renders a feature map for any given view and timestamp. By applying style transfer on the learned feature vectors instead of the rendered feature map, we enhance spatio-temporal consistency across frames. Our method demonstrates superior performance and coherence over state-of-the-art baselines in tests on real-world dynamic scenes, making it a robust solution for practical applications.

MCML Authors

Florian Hofherr

Computer Vision & Artificial Intelligence

Mariia Gladkova

Computer Vision & Artificial Intelligence

Cecilia Curreli

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1467]

R. Schwank, A. McCormack and M. Drton.
Robust Score Matching.
Preprint (Jan. 2025). arXiv

Abstract

Proposed in Hyvärinen (2005), score matching is a parameter estimation procedure that does not require computation of distributional normalizing constants. In this work we utilize the geometric median of means to develop a robust score matching procedure that yields consistent parameter estimates in settings where the observed data has been contaminated. A special appeal of the proposed method is that it retains convexity in exponential family models. The new method is therefore particularly attractive for non-Gaussian, exponential family graphical models where evaluation of normalizing constants is intractable. Support recovery guarantees for such models when contamination is present are provided. Additionally, support recovery is studied in numerical experiments and on a precipitation dataset. We demonstrate that the proposed robust score matching estimator performs comparably to the standard score matching estimator when no contamination is present but greatly outperforms this estimator in a setting with contamination.

MCML Authors

Mathias Drton

Prof. Dr.

Mathematical Statistics

[1466]

M. H. Shaker and E. Hüllermeier.
Random Forest Calibration.
Preprint (Jan. 2025). arXiv

Abstract

The Random Forest (RF) classifier is often claimed to be relatively well calibrated when compared with other machine learning methods. Moreover, the existing literature suggests that traditional calibration methods, such as isotonic regression, do not substantially enhance the calibration of RF probability estimates unless supplied with extensive calibration data sets, which can represent a significant obstacle in cases of limited data availability. Nevertheless, there seems to be no comprehensive study validating such claims and systematically comparing state-of-the-art calibration methods specifically for RF. To close this gap, we investigate a broad spectrum of calibration methods tailored to or at least applicable to RF, ranging from scaling techniques to more advanced algorithms. Our results based on synthetic as well as real-world data unravel the intricacies of RF probability estimates, scrutinize the impacts of hyper-parameters, compare calibration methods in a systematic way. We show that a well-optimized RF performs as well as or better than leading calibration approaches.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1465]

I. Tsangko, A. Triantafyllopoulos, M. Müller, H. Schröter and B. W. Schuller.
DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids.
Preprint (Jan. 2025). arXiv

Abstract

The DeepFilterNet (DFN) architecture was recently proposed as a deep learning model suited for hearing aid devices. Despite its competitive performance on numerous benchmarks, it still follows a `one-size-fits-all’ approach, which aims to train a single, monolithic architecture that generalises across different noises and environments. However, its limited size and computation budget can hamper its generalisability. Recent work has shown that in-context adaptation can improve performance by conditioning the denoising process on additional information extracted from background recordings to mitigate this. These recordings can be offloaded outside the hearing aid, thus improving performance while adding minimal computational overhead. We introduce these principles to the DFN model, thus proposing the DFingerNet (DFiN) model, which shows superior performance on various benchmarks inspired by the DNS Challenge.

MCML Authors

Iosif Tsangko

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1464]

T. N. Wolf and C. Wachinger.
WASUP: Interpretable Classification with Weight-Input Alignment and Class-Discriminative SUPports Vectors.
Preprint (Jan. 2025). arXiv

Abstract

The deployment of deep learning models in critical domains necessitates a balance between high accuracy and interpretability. We introduce WASUP, an inherently interpretable neural network that provides local and global explanations of its decision-making process. We prove that these explanations are faithful by fulfilling established axioms for explanations. Leveraging the concept of case-based reasoning, WASUP extracts class-representative support vectors from training images, ensuring they capture relevant features while suppressing irrelevant ones. Classification decisions are made by calculating and aggregating similarity scores between these support vectors and the input’s latent feature vector. We employ B-Cos transformations, which align model weights with inputs to enable faithful mappings of latent features back to the input space, facilitating local explanations in addition to global explanations of case-based reasoning. We evaluate WASUP on three tasks: fine-grained classification on Stanford Dogs, multi-label classification on Pascal VOC, and pathology detection on the RSNA dataset. Results indicate that WASUP not only achieves competitive accuracy compared to state-of-the-art black-box models but also offers insightful explanations verified through theoretical analysis. Our findings underscore WASUP’s potential for applications where understanding model decisions is as critical as the decisions themselves.

MCML Authors

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1463]

Z. Yang, M. Song, X. Jing, H. Zhang, K. Qian, B. Hu, K. Tamada, T. Takumi, B. W. Schuller and Y. Yamamoto.
MADUV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge.
Preprint (Jan. 2025). arXiv

Abstract

The Mice Autism Detection via Ultrasound Vocalization (MADUV) Challenge introduces the first INTERSPEECH challenge focused on detecting autism spectrum disorder (ASD) in mice through their vocalizations. Participants are tasked with developing models to automatically classify mice as either wild-type or ASD models based on recordings with a high sampling rate. Our baseline system employs a simple CNN-based classification using three different spectrogram features. Results demonstrate the feasibility of automated ASD detection, with the considered audible-range features achieving the best performance (UAR of 0.600 for segment-level and 0.625 for subject-level classification). This challenge bridges speech technology and biomedical research, offering opportunities to advance our understanding of ASD models through machine learning approaches. The findings suggest promising directions for vocalization analysis and highlight the potential value of audible and ultrasound vocalizations in ASD detection.

MCML Authors

Xin Jing

Health Informatics

Björn Schuller

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Health Informatics

2024

[1462]

N. Strauß.
Artificial intelligence for resource allocation tasks.
Dissertation 2024. DOI

Abstract

This thesis presents deep reinforcement learning approaches for complex resource allocation tasks, including discrete, continuous, and resource collection problems. It introduces novel neural architectures achieving state-of-the-art results in spatial resource allocation, multi-agent collection, and dynamic ambulance redeployment, including electric ambulances. For continuous tasks like portfolio optimization, it proposes efficient methods to handle allocation constraints, ensuring compliance during training and deployment. (Shortened).

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

[1461]

B. Kühbacher, F. Iglesias-Suarez, N. Kilbertus and V. Eyring.
Towards Physically Consistent Deep Learning For Climate Model Parameterizations.
ICMLA 2024 - 23rd IEEE International Conference on Machine Learning and Applications. Miami, FL, USA, Dec 18-20, 2024. DOI

Abstract

Climate models play a critical role in understanding and projecting climate change. Due to their complexity, their horizontal resolution of about 40-100 km remains too coarse to resolve processes such as clouds and convection, which need to be approximated via parameterizations. These parameterizations are a major source of systematic errors and large uncertainties in climate projections. Deep learning (DL)-based parameterizations, trained on data from computationally expensive short, high-resolution simulations, have shown great promise for improving climate models in that regard. However, their lack of interpretability and tendency to learn spurious non-physical correlations result in reduced trust in the climate simulation. We propose an efficient supervised learning framework for DL-based parameterizations that leads to physically consistent models with improved interpretability and negligible computational overhead compared to standard supervised training. First, key features determining the target physical processes are uncovered. Subsequently, the neural network is fine-tuned using only those relevant features. We show empirically that our method robustly identifies a small subset of the inputs as actual physical drivers, therefore removing spurious non-physical relationships. This results in by design physically consistent and interpretable neural networks while maintaining the predictive performance of unconstrained black-box DL-based parameterizations.

MCML Authors

Birgit Kühbacher

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Ethics in Systems Design and Machine Learning

[1460]

K. Bieker, H. T. Kussaba, P. Scholl, J. Jung, A. Swikir, S. Haddadin and G. Kutyniok.
Compositional Construction of Barrier Functions for Switched Impulsive Systems.
CDC 2024 - 63rd IEEE Conference on Decision and Control. Milan, Italy, Dec 16-19, 2024. DOI

Abstract

Many systems occurring in real-world applications, such as controlling the motions of robots or modeling the spread of diseases, are switched impulsive systems. To ensure that the system state stays in a safe region (e.g., to avoid collisions with obstacles), barrier functions are widely utilized. As the system dimension increases, deriving suitable barrier functions becomes extremely complex. Fortunately, many systems consist of multiple subsystems, such as different areas where the disease occurs. In this work, we present sufficient conditions for interconnected switched impulsive systems to maintain safety by constructing local barrier functions for the individual subsystems instead of a global one, allowing for much easier and more efficient derivation. To validate our results, we numerically demonstrate its effectiveness using an epidemiological model.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1459]

M. Keicher.
Multimodal Deep Learning for Holistic Clinical Decision and Reasoning Support.
Dissertation 2024. URL

Abstract

In clinical decision-making, medical doctors rely not only on a multitude of information about a patient, including lab results and imaging data, but also on their extensive knowledge gained through formal education and experience with previously treated patients. This thesis explores clinical decision support systems based on deep learning that integrate multimodal knowledge about a patient with formal and exemplar clinical knowledge while providing insight into their reasoning.

MCML Authors

Matthias Keicher

Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Computer Aided Medical Procedures & Augmented Reality

[1458]

L. Gosch, M. Sabanayagam, D. Ghoshdastidar and S. Günnemann.
Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks.
AdvML-Frontiers @NeurIPS 2024 - 3rd Workshop on New Frontiers in Adversarial Machine Learning at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Generalization of machine learning models can be severely compromised by data poisoning, where adversarial changes are applied to the training data. This vulnerability has led to interest in certifying (i.e., proving) that such changes up to a certain magnitude do not affect test predictions. We, for the first time, certify Graph Neural Networks (GNNs) against poisoning attacks, including backdoors, targeting the node features of a given graph. Our certificates are white-box and based upon (i) the neural tangent kernel, which characterizes the training dynamics of sufficiently wide networks; and (ii) a novel reformulation of the bilevel optimization describing poisoning as a mixed-integer linear program. We note that our framework is more general and constitutes the first approach to derive white-box poisoning certificates for NNs, which can be of independent interest beyond graph-related tasks.

MCML Authors

Lukas Gosch

Data Analytics & Machine Learning

Debarghya Ghoshdastidar

Prof. Dr.

Theoretical Foundations of Artificial Intelligence

Stephan Günnemann

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Data Analytics & Machine Learning

[1457]

C. Bülte, P. Scholl and G. Kutyniok.
Probabilistic predictions with Fourier neural operators.
BDU @NeurIPS 2024 - Workshop Bayesian Decision-making and Uncertainty: from probabilistic and spatiotemporal modeling to sequential experiment design at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Neural networks have been successfully applied in modeling partial differential equations, especially in dynamical systems. Commonly used models, such as neural operators, are performing well at deterministic prediction tasks, but lack a quantification of the uncertainty inherent in many complex systems, for example weather forecasting. In this paper, we explore a new approach that combines Fourier neural operators with generative modeling based on strictly proper scoring rules in order to create well-calibrated probabilistic predictions of dynamical systems. We demonstrate improved predictive uncertainty for our approach, especially in settings with very high inherent uncertainty.

MCML Authors

Christopher Bülte

Mathematical Foundations of Artificial Intelligence

Philipp Scholl

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1456]

A. Koebler, T. Decker, I. Thon, V. Tresp and F. Buettner.
Incremental Uncertainty-aware Performance Monitoring with Labeling Intervention.
BDU @NeurIPS 2024 - Workshop Bayesian Decision-making and Uncertainty: from probabilistic and spatiotemporal modeling to sequential experiment design at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

We study the problem of monitoring machine learning models under temporal distribution shifts, where circumstances change gradually over time, often leading to unnoticed yet significant declines in accuracy. We propose Incremental Uncertainty-aware Performance Monitoring (IUPM), a novel label-free method that estimates model performance by modeling time-dependent shifts using optimal transport. IUPM also quantifies uncertainty in performance estimates and introduces an active labeling strategy to reduce this uncertainty. We further showcase the benefits of IUPM on different datasets and simulated temporal shifts over existing baselines.

MCML Authors

Thomas Decker

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1455]

A. White, A. Büttner, M. Gelbrecht, N. Kilbertus, F. Hellmann and N. Boers.
Projected Neural Differential Equations for Power Grid Modeling with Constraints.
D3S3 @NeurIPS 2024 - Workshop on Data-driven and Differentiable Simulations, Surrogates, and Solvers at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Neural differential equations offer a powerful approach for data-driven simulation. However, many applications in science and engineering possess known constraints that should be obeyed by the learned model. We introduce projected neural differential equations (PNDEs), a new method for constraining neural differential equations based on projection of the learned vector field to the tangent space of the constraint manifold. In tests on two challenging examples from power grid modeling, PNDEs outperform existing methods while requiring fewer hyperparameters. Our approach demonstrates significant potential for enhancing the modeling of constrained dynamical systems, particularly in complex domains like power grid dynamics where accuracy and reliability are essential.

MCML Authors

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[1454]

B. Cong, N. Daheim, Y. Shen, D. Cremers, R. Yokota, M. Khan and T. Möllenhoff.
Variational Low-Rank Adaptation Using IVON.
FITML @NeurIPS 2024 - Workshop Fine-Tuning in Modern Machine Learning: Principles and Scalability at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. The accuracy is also better than the other Bayesian alternatives, yet the cost is lower and the implementation is easier. Our work provides additional evidence for the effectiveness of IVON for large language models.

MCML Authors

Yuesong Shen

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1453]

E. Ailer, N. Dern, J. Hartford and N. Kilbertus.
Targeted Sequential Indirect Experiment Design.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Scientific hypotheses typically concern specific aspects of complex, imperfectly understood or entirely unknown mechanisms, such as the effect of gene expression levels on phenotypes or how microbial communities influence environmental health. Such queries are inherently causal (rather than purely associational), but in many settings, experiments can not be conducted directly on the target variables of interest, but are indirect. Therefore, they perturb the target variable, but do not remove potential confounding factors. If, additionally, the resulting experimental measurements are multi-dimensional and the studied mechanisms nonlinear, the query of interest is generally not identified. We develop an adaptive strategy to design indirect experiments that optimally inform a targeted query about the ground truth mechanism in terms of sequentially narrowing the gap between an upper and lower bound on the query. While the general formulation consists of a bi-level optimization procedure, we derive an efficiently estimable analytical kernel-based estimator of the bounds for the causal effect, a query of key interest, and demonstrate the efficacy of our approach in confounded, multivariate, nonlinear synthetic settings.

MCML Authors

Elisabeth Ailer

* Former Member

Niki Kilbertus

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Ethics in Systems Design and Machine Learning

[1452]

A. Bonfanti, G. Bruno and C. Cipriani.
The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show how the NTK perspective falls short in the nonlinear scenario. Specifically, we establish that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief. Another significant difference from the linear regime is that, even in the idealistic infinite-width limit, the Hessian does not vanish and hence it cannot be disregarded during training. This motivates the adoption of second-order optimization methods. We explore the convergence guarantees of such methods in both linear and nonlinear cases, addressing challenges such as spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we highlight the benefits of second-order methods in benchmark test cases.

MCML Authors

Cristina Cipriani

* Former Member

[1451]

R. Dhahri, A. Immer, B. Charpentier, S. Günnemann and V. Fortuin.
Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Neural network sparsification is a promising avenue to save computational time and memory costs, especially in an age where many successful AI models are becoming too large to naïvely deploy on consumer hardware. While much work has focused on different weight pruning criteria, the overall sparsifiability of the network, i.e., its capacity to be pruned without quality loss, has often been overlooked. We present Sparsifiability via the Marginal likelihood (SpaM), a pruning framework that highlights the effectiveness of using the Bayesian marginal likelihood in conjunction with sparsity-inducing priors for making neural networks more sparsifiable. Our approach implements an automatic Occam’s razor that selects the most sparsifiable model that still explains the data well, both for structured and unstructured sparsification. In addition, we demonstrate that the pre-computed posterior Hessian approximation used in the Laplace approximation can be re-used to define a cheap pruning criterion, which outperforms many existing (more expensive) approaches. We demonstrate the effectiveness of our framework, especially at high sparsity levels, across a range of different neural network architectures and datasets.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Vincent Fortuin

Dr.

Bayesian Deep Learning

[1450]

L. Eyring, S. Karthik, K. Roth, A. Dosovitskiy and Z. Akata.
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Text-to-Image (T2I) models have made significant advancements in recent years, but they still struggle to accurately capture intricate details specified in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from ‘reward hacking’ and may not generalize well to unseen prompt distributions. In this work, we propose Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on the signal from one or multiple human preference reward models. Remarkably, solving this optimization problem with gradient ascent for 50 iterations yields impressive results on four different one-step models across two competitive benchmarks, T2I-CompBench and GenEval. Within a computational budget of 20-50 seconds, ReNO-enhanced one-step models consistently surpass the performance of all current open-source Text-to-Image models. Extensive user studies demonstrate that our model is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. Moreover, given the same computational resources, a ReNO-optimized one-step model outperforms widely-used open-source models such as SDXL and PixArt-α, highlighting the efficiency and effectiveness of ReNO in enhancing T2I model performance at inference time.

MCML Authors

Luca Eyring

Interpretable and Reliable Machine Learning

Shyamgopal Karthik

Interpretable and Reliable Machine Learning

Karsten Roth

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Interpretable and Reliable Machine Learning

[1449]

F. Hoppe, C. M. Verdun, H. Laus, F. Krahmer and H. Rauhut.
Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. Our work rigorously addresses this issue and derives a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, our analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings bridge the gap between established theory and the practical applicability of such debiased methods.

MCML Authors

Hannah Laus

Optimization & Data Analysis

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[1448]

A. Javanmardi, D. Stutz and E. Hüllermeier.
Conformalized Credal Set Predictors.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence and Machine Learning

[1447]

A. H. Kargaran, F. Yvon and H. Schütze.
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

The need for large text corpora has increased with the advent of pretrained language models and, in particular, the discovery of scaling laws for these models. Most available corpora have sufficient data only for languages with large dominant communities. However, there is no corpus available that (i) covers a wide range of minority languages; (ii) is generated by an open-source reproducible pipeline; and (iii) is rigorously cleaned from noise, making it trustworthy to use. We present GlotCC, a clean, document-level, 2TB general domain corpus derived from CommonCrawl, covering more than 1000 languages. We make GlotCC and the system used to generate it - including the pipeline, language identification model, and filters - available to the research community.

MCML Authors

Amir Hossein Kargaran

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B1 | Computer Vision
→ Group Nils Thuerey

Computational Linguistics

[1446]

F. Köhler, S. Niedermayr, R. Westermann and N. Thuerey.
APEBench: A Benchmark for Autoregressive Neural Emulators of PDEs.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

We introduce the Autoregressive PDE Emulator Benchmark (APEBench), a comprehensive benchmark suite to evaluate autoregressive neural emulators for solving partial differential equations. APEBench is based on JAX and provides a seamlessly integrated differentiable simulation framework employing efficient pseudo-spectral methods, enabling 46 distinct PDEs across 1D, 2D, and 3D. Facilitating systematic analysis and comparison of learned emulators, we propose a novel taxonomy for unrolled training and introduce a unique identifier for PDE dynamics that directly relates to the stability criteria of classical numerical methods. APEBench enables the evaluation of diverse neural architectures, and unlike existing benchmarks, its tight integration of the solver enables support for differentiable physics training and neural-hybrid emulators. Moreover, APEBench emphasizes rollout metrics to understand temporal generalization, providing insights into the long-term behavior of emulating PDE dynamics. In several experiments, we highlight the similarities between neural emulators and numerical simulators.

MCML Authors

Felix Köhler

Physics-based Simulation

Rüdiger Westermann

Prof. Dr.

Computer Graphics & Visualization

Nils Thuerey

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Physics-based Simulation

[1445]

M. Kollovieh, B. Charpentier, D. Zügner and S. Günnemann.
Expected Probabilistic Hierarchies.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Hierarchical clustering has usually been addressed by discrete optimization using heuristics or continuous optimization of relaxed scores for hierarchies. In this work, we propose to optimize expected scores under a probabilistic model over hierarchies. (1) We show theoretically that the global optimal values of the expected Dasgupta cost and Tree-Sampling divergence (TSD), two unsupervised metrics for hierarchical clustering, are equal to the optimal values of their discrete counterparts contrary to some relaxed scores. (2) We propose Expected Probabilistic Hierarchies (EPH), a probabilistic model to learn hierarchies in data by optimizing expected scores. EPH uses differentiable hierarchy sampling enabling end-to-end gradient descent based optimization, and an unbiased subgraph sampling approach to scale to large datasets. (3) We evaluate EPH on synthetic and real-world datasets including vector and graph datasets. EPH outperforms all other approaches quantitatively and provides meaningful hierarchies in qualitative evaluations.

MCML Authors

Marcel Kollovieh

Data Analytics & Machine Learning

Daniel Zügner

Dr.

A3 | Computational Models
→ Group Stephan Günnemann

* Former Member

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[1444]

G. Ma, Y. Wang, D. Lim, S. Jegelka and Y. Wang.
A Canonicalization Perspective on Invariant and Equivariant Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

In many applications, we desire neural networks to exhibit invariance or equivariance to certain groups due to symmetries inherent in the data. Recently, frame-averaging methods emerged to be a unified framework for attaining symmetries efficiently by averaging over input-dependent subsets of the group, i.e., frames. What we currently lack is a principled understanding of the design of frames. In this work, we introduce a canonicalization perspective that provides an essential and complete view of the design of frames. Canonicalization is a classic approach for attaining invariance by mapping inputs to their canonical forms. We show that there exists an inherent connection between frames and canonical forms. Leveraging this connection, we can efficiently compare the complexity of frames as well as determine the optimality of certain frames. Guided by this principle, we design novel frames for eigenvectors that are strictly superior to existing methods – some are even optimal – both theoretically and empirically. The reduction to the canonicalization perspective further uncovers equivalences between previous methods. These observations suggest that canonicalization provides a fundamental understanding of existing frame-averaging methods and unifies existing equivariant and invariant learning methods.

MCML Authors

Stefanie Jegelka

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Foundations of Deep Neural Networks

[1443]

Y. Ma, V. Melnychuk, J. Schweisthal and S. Feuerriegel.
DiffPO: A causal diffusion model for learning distributions of potential outcomes.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Predicting potential outcomes of interventions from observational data is crucial for decision-making in medicine, but the task is challenging due to the fundamental problem of causal inference. Existing methods are largely limited to point estimates of potential outcomes with no uncertain quantification; thus, the full information about the distributions of potential outcomes is typically ignored. In this paper, we propose a novel causal diffusion model called DiffPO, which is carefully designed for reliable inferences in medicine by learning the distribution of potential outcomes. In our DiffPO, we leverage a tailored conditional denoising diffusion model to learn complex distributions, where we address the selection bias through a novel orthogonal diffusion loss. Another strength of our DiffPO method is that it is highly flexible (e.g., it can also be used to estimate different causal quantities such as CATE). Across a wide range of experiments, we show that our method achieves state-of-the-art performance.

MCML Authors

Yuchen Ma

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[1442]

V. Melnychuk, S. Feuerriegel and M. van der Schaar.
Quantifying Aleatoric Uncertainty of the Treatment Effect: A Novel Orthogonal Learner.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Estimating causal quantities from observational data is crucial for understanding the safety and effectiveness of medical treatments. However, to make reliable inferences, medical practitioners require not only estimating averaged causal quantities, such as the conditional average treatment effect, but also understanding the randomness of the treatment effect as a random variable. This randomness is referred to as aleatoric uncertainty and is necessary for understanding the probability of benefit from treatment or quantiles of the treatment effect. Yet, the aleatoric uncertainty of the treatment effect has received surprisingly little attention in the causal machine learning community. To fill this gap, we aim to quantify the aleatoric uncertainty of the treatment effect at the individualized (covariate-conditional) level, namely, the conditional distribution of the treatment effect (CDTE). Unlike average causal quantities, the CDTE is not point identifiable without strong additional assumptions. As a remedy, we employ partial identification to obtain sharp bounds on the CDTE and thereby quantify the aleatoric uncertainty of the treatment effect. We then develop a novel, orthogonal learner for the bounds on the CDTE, which we call AU-learner. We further show that our AU-learner has several strengths in that it satisfies Neyman-orthogonality and is doubly robust. Finally, we propose a fully-parametric deep learning instantiation of our AU-learner.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1441]

M. Muschalik, H. Baniecki, F. Fumagalli, P. Kolpaczki, B. Hammer and E. Hüllermeier.
shapiq: Shapley Interactions for Machine Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1440]

T. Nagler, L. Schneider, B. Bischl and M. Feurer.
Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Hyperparameter optimization is crucial for obtaining peak performance of machine learning models. The standard protocol evaluates various hyperparameter configurations using a resampling estimate of the generalization error to guide optimization and select a final hyperparameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model’s generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment. While reshuffling leads to test performances that are competitive with using fixed splits, it drastically improves results for a single train-validation holdout protocol and can often make holdout become competitive with standard CV while being computationally cheaper.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Statistics & Data Science

Lennart Schneider

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Matthias Feurer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Statistical Learning and Data Science

[1439]

R. Paolino, S. Maskey, P. Welke and G. Kutyniok.
Weisfeiler and Leman Go Loopy: A New Hierarchy for Graph Representational Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

We introduce r-loopy Weisfeiler-Leman (r-ℓWL), a novel hierarchy of graph isomorphism tests and a corresponding GNN framework, r-ℓMPNN, that can count cycles up to length r+2. Most notably, we show that r-ℓWL can count homomorphisms of cactus graphs. This strictly extends classical 1-WL, which can only count homomorphisms of trees and, in fact, is incomparable to k-WL for any fixed k. We empirically validate the expressive and counting power of the proposed r-ℓMPNN on several synthetic datasets and present state-of-the-art predictive performance on various real-world datasets.

MCML Authors

Raffaele Paolino

Mathematical Foundations of Artificial Intelligence

Sohir Maskey

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1438]

D. Rügamer, B. X. W. Liew, Z. Altai and A. Stöcker.
A Functional Extension of Semi-Structured Networks.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Semi-structured networks (SSNs) merge the structures familiar from additive models with deep neural networks, allowing the modeling of interpretable partial feature effects while capturing higher-order non-linearities at the same time. A significant challenge in this integration is maintaining the interpretability of the additive model component. Inspired by large-scale biomechanics datasets, this paper explores extending SSNs to functional data. Existing methods in functional data analysis are promising but often not expressive enough to account for all interactions and non-linearities and do not scale well to large datasets. Although the SSN approach presents a compelling potential solution, its adaptation to functional data remains complex. In this work, we propose a functional SSN method that retains the advantageous properties of classical functional regression approaches while also improving scalability. Our numerical experiments demonstrate that this approach accurately recovers underlying signals, enhances predictive performance, and performs favorably compared to competing methods.

MCML Authors

David Rügamer

Prof. Dr.

B3 | Multimodal Perception
→ Group Matthias Althoff

Statistics, Data Science and Machine Learning

[1437]

R. Stolz, H. Krasowski, J. Thumm, M. Eichelbeck, P. Gassert and M. Althoff.
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Continuous action spaces in reinforcement learning (RL) are commonly defined as multidimensional intervals. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using proximal policy optimization (PPO), we evaluate our methods on four control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.

MCML Authors

Hanna Krasowski

Dr.

* Former Member

Michael Eichelbeck

B3 | Multimodal Perception
→ Group Matthias Althoff

Cyber Physical Systems

Philipp Gassert

B3 | Multimodal Perception
→ Group Matthias Althoff

Cyber Physical Systems

Matthias Althoff

Prof. Dr.

Cyber Physical Systems

[1436]

V. Udandarao, K. Roth, S. Dziadzio, A. Prabhu, M. Cherti, O. Vinyals, O. Hénaff, S. Albanie, Z. Akata and M. Bethge.
A Practitioner's Guide to Real-World Continual Multimodal Pretraining.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Multimodal foundation models serve numerous applications at the intersection of vision and language. Still, despite being pretrained on extensive data, they become outdated over time. To keep models updated, research into continual pretraining mainly explores scenarios with either (1) infrequent, indiscriminate updates on large-scale new data, or (2) frequent, sample-level updates. However, practical model deployment often operates in the gap between these two limit cases, as real-world applications often demand adaptation to specific subdomains, tasks or concepts – spread over the entire, varying life cycle of a model. In this work, we complement current perspectives on continual pretraining through a research test bed as well as provide comprehensive guidance for effective continual model updates in such scenarios. We first introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements, constructed over 63 datasets with diverse visual and semantic coverage. Using FoMo-in-Flux, we explore the complex landscape of practical continual pretraining through multiple perspectives: (1) A data-centric investigation of data mixtures and stream orderings that emulate real-world deployment situations, (2) a method-centric investigation ranging from simple fine-tuning and traditional continual learning strategies to parameter-efficient updates and model merging, (3) meta learning rate schedules and mechanistic design choices, and (4) the influence of model and compute scaling. Together, our insights provide a practitioner’s guide to continual multimodal pretraining for real-world deployment.

MCML Authors

Karsten Roth

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1435]

J. Wang, M. Ghahremani, Y. Li, B. Ommer and C. Wachinger.
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions. Current methods, however, exhibit limited performance when guided by skeleton human poses, especially in complex pose conditions such as side or rear perspectives of human figures. To address this issue, we present Stable-Pose, a novel adapter model that introduces a coarse-to-fine attention masking strategy into a vision Transformer (ViT) to gain accurate pose guidance for T2I models. Stable-Pose is designed to adeptly handle pose conditions within pre-trained Stable Diffusion, providing a refined and efficient way of aligning pose representation during image synthesis. We leverage the query-key self-attention mechanism of ViTs to explore the interconnections among different anatomical parts in human pose skeletons. Masked pose images are used to smoothly refine the attention maps based on target pose-related features in a hierarchical manner, transitioning from coarse to fine levels. Additionally, our loss function is formulated to allocate increased emphasis to the pose region, thereby augmenting the model’s precision in capturing intricate pose details. We assessed the performance of Stable-Pose across five public datasets under a wide range of indoor and outdoor human pose scenarios. Stable-Pose achieved an AP score of 57.1 in the LAION-Human dataset, marking around 13% improvement over the established technique ControlNet.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Yitong Li

Artificial Intelligence in Medical Imaging

Björn Ommer

Prof. Dr.

Computer Vision & Learning

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1434]

Y. Wang, K. Hu, S. Gupta, Z. Ye, Y. Wang and S. Jegelka.
Understanding the Role of Equivariance in Self-supervised Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Contrastive learning has been a leading paradigm for self-supervised learning, but it is widely observed that it comes at the price of sacrificing useful features (eg colors) by being invariant to data augmentations. Given this limitation, there has been a surge of interest in equivariant self-supervised learning (E-SSL) that learns features to be augmentation-aware. However, even for the simplest rotation prediction method, there is a lack of rigorous understanding of why, when, and how E-SSL learns useful features for downstream tasks. To bridge this gap between practice and theory, we establish an information-theoretic perspective to understand the generalization ability of E-SSL. In particular, we identify a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks. This synergy effect encourages models to extract class-relevant features to improve its equivariant prediction, which, in turn, benefits downstream tasks requiring semantic features. Based on this perspective, we theoretically analyze the influence of data transformations and reveal several principles for practical designs of E-SSL. Our theory not only aligns well with existing E-SSL methods but also sheds light on new directions by exploring the benefits of model equivariance. We believe that a theoretically grounded understanding on the role of equivariance would inspire more principled and advanced designs in this field.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1433]

Y. Wang, Y. Wu, Z. Wei, S. Jegelka and Y. Wang.
A Theoretical Understanding of Self-Correction through In-context Alignment.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Going beyond mimicking limited human experiences, recent studies show initial evidence that, like humans, large language models (LLMs) are capable of improving their abilities purely by self-correction, i.e., correcting previous responses through self-examination, in certain circumstances. Nevertheless, little is known about how such capabilities arise. In this work, based on a simplified setup akin to an alignment task, we theoretically analyze self-correction from an in-context learning perspective, showing that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way. Notably, going beyond previous theories on over-simplified linear transformers, our theoretical construction underpins the roles of several key designs of realistic transformers for self-correction: softmax attention, multi-head attention, and the MLP block. We validate these findings extensively on synthetic datasets. Inspired by these findings, we also illustrate novel applications of self-correction, such as defending against LLM jailbreaks, where a simple self-correction step does make a large difference. We believe that these findings will inspire further research on understanding, exploiting, and enhancing self-correction for building better foundation models.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1432]

D. Winkel, N. Strauß, M. Bernhard, Z. Li, T. Seidl and M. Schubert.
Autoregressive Policy Optimization for Constrained Allocation Tasks.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Allocation tasks represent a class of problems where a limited amount of resources must be allocated to a set of entities at each time step. Prominent examples of this task include portfolio optimization or distributing computational workloads across servers. Allocation tasks are typically bound by linear constraints describing practical requirements that have to be strictly fulfilled at all times. In portfolio optimization, for example, investors may be obligated to allocate less than 30% of the funds into a certain industrial sector in any investment period. Such constraints restrict the action space of allowed allocations in intricate ways, which makes learning a policy that avoids constraint violations difficult. In this paper, we propose a new method for constrained allocation tasks based on an autoregressive process to sequentially sample allocations for each entity. In addition, we introduce a novel de-biasing mechanism to counter the initial bias caused by sequential sampling. We demonstrate the superior performance of our approach compared to a variety of Constrained Reinforcement Learning (CRL) methods on three distinct constrained allocation tasks: portfolio optimization, computational workload distribution, and a synthetic allocation benchmark.

MCML Authors

David Winkel

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Maximilian Bernhard

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Zongyue Li

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[1431]

M. Yau, N. Karalias, E. Lu, J. Xu and S. Jegelka.
Are Graph Neural Networks Optimal Approximation Algorithms?
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

In this work we design graph neural network architectures that capture optimal approximation algorithms for a large class of combinatorial optimization problems, using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message-passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problems assuming the Unique Games Conjecture. We leverage this result to construct efficient graph neural network architectures, OptGNN, that obtain high-quality approximate solutions on landmark combinatorial optimization problems such as Max-Cut, Min-Vertex-Cover, and Max-3-SAT. Our approach achieves strong empirical results across a wide range of real-world and synthetic datasets against solvers and neural baselines. Finally, we take advantage of OptGNN’s ability to capture convex relaxations to design an algorithm for producing bounds on the optimal solution from the learned embeddings of OptGNN.

MCML Authors

Stefanie Jegelka

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Foundations of Deep Neural Networks

[1430]

Y. Zhang, Y. Li, X. Wang, Q. Shen, B. Plank, B. Bischl, M. Rezaei and K. Kawaguchi.
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models.
NeurIPS 2024 - Workshop on Machine Learning and Compression at the 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To address these issues, we propose FinerCut, a new form of fine-grained layer pruning, which in contrast to prior work at the transformer block level, considers all self-attention and feed-forward network (FFN) layers within blocks as individual pruning candidates. FinerCut prunes layers whose removal causes minimal alternation to the model’s output – contributing to a new, lean, interpretable, and task-agnostic pruning method. Tested across 9 benchmarks, our approach retains 90% performance of Llama3-8B with 25% layers removed, and 95% performance of Llama3-70B with 30% layers removed, all without fine-tuning or post-pruning reconstruction. Strikingly, we observe intriguing results with FinerCut: 42% (34 out of 80) of the self-attention layers in Llama3-70B can be removed while preserving 99% of its performance – without additional fine-tuning after removal. Moreover, FinerCut provides a tool to inspect the types and locations of pruned layers, allowing to observe interesting pruning behaviors. For instance, we observe a preference for pruning self-attention layers, often at deeper consecutive decoder layers. We hope our insights inspire future efficient LLM architecture designs.

MCML Authors

Yawei Li

Statistical Learning and Data Science

Xinpeng Wang

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[1429]

M. Koshil, T. Nagler, M. Feurer and K. Eggensperger.
Towards Localization via Data Embedding for TabPFN.
TLR @NeurIPS 2024 - 3rd Table Representation Learning Workshop at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Prior-data fitted networks (PFNs), especially TabPFN, have shown significant promise in tabular data prediction. However, their scalability is limited by the quadratic complexity of the transformer architecture’s attention across training points. In this work, we propose a method to localize TabPFN, which embeds data points into a learned representation and performs nearest neighbor selection in this space. We evaluate it across six datasets, demonstrating its superior performance over standard TabPFN when scaling to larger datasets. We also explore its design choices and analyze the bias-variance trade-off of this localization method, showing that it reduces bias while maintaining manageable variance. This work opens up a pathway for scaling TabPFN to arbitrarily large tabular datasets.

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Stefan Bauer

Statistical Learning and Data Science

[1428]

B. M. G. Nielsen, L. Gresele and A. Dittadi.
Challenges in Explaining Representational Similarity through Identifiability.
UniReps @NeurIPS 2024 - 2nd Workshop on Unifying Representations in Neural Models at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

The phenomenon of different deep learning models producing similar data representations has garnered significant attention, raising the question of why such representational similarity occurs. Identifiability theory offers a partial explanation: for a broad class of discriminative models, including many popular in representation learning, those assigning equal likelihood to the observations yield representations that are equal up to a linear transformation, if a suitable diversity condition holds. In this work, we identify two key challenges in applying identifiability theory to explain representational similarity. First, the assumption of exact likelihood equality is rarely satisfied by practical models trained with different initializations. To address this, we describe how the representations of two models deviate from being linear transformations of each other, based on their difference in log-likelihoods. Second, we demonstrate that even models with similar and near-optimal loss values can produce highly dissimilar representations due to an underappreciated difference between loss and likelihood. Our findings highlight key open questions and point to future research directions for advancing the theoretical understanding of representational similarity.

MCML Authors

Andrea Dittadi

Dr.

Algorithmic Machine Learning & Explainable AI

[1427]

T. Beker and X. Zhu.
Volcanic Deformation Monitoring utilizing Deep Learning and Wavelet Transform.
AGU 2024 - American Geophysical Union Annual Meeting. Washington D.C., USA, Dec 09-13, 2024. URL

Abstract

There are 20-50 new volcanic eruptions annually, which often do not have onsite monitoring. InSAR can be used to globally monitor volcanic deformations, even in hard-to-reach areas. With state-of-the-art persistent and distributed scatterer processing, InSAR data can even point to the volcanoes’ subtle, few mm/year changes and deep learning (DL) models can red flag them. Our research leverages the practical application of DL with a classification architecture, InceptionResNet v2, to identify InSAR data containing volcanic deformations. We utilize 5-year-long deformation maps covering the Central Volcanic Zone in the South American Andes, reserving the area known for its volcanoes for testing. The remaining data, in combination with synthetic volcanic deformations, is used for training. The explainability tool, Grad-CAM, shows that due to the nature of subtle volcanic deformations observed by InSAR, the model is struggling to delineate and distinguish volcanic deformation signals. We use wavelet transformations and filtering to enhance the data and improve the DL model performance. Daubechies 2 wavelet transform accentuates subtle large-surface signals, which are often volcanic in nature while removing the subtle high-frequency patterns. The DL models are trained, and each is tested on the data with a different number of wavelet transforms from 0-4. The model trained and tested on original data achieved a 64.02% AUC ROC average over 3 runs, while when tested on data two times transformed by wavelet transform, it improved to 84.14% AUC ROC average over 3 runs. These findings prove that Daubechies 2 wavelet transform cleans data while exaggerating the volcanic deformation. It also enlarges the small point deformation sources large in intensity, which can be solved by filtering beforehand. The models trained and used in this way detect all 5 different subtle volcanic deformations in the region, with smallest being 5 mm/year.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1426]

C. Leiber, N. Strauß, M. Schubert and T. Seidl.
Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters.
DLC @ICDM 2024 - 6th Workshop on Deep Learning and Clustering at the 24th IEEE International Conference on Data Mining (ICDM 2024). Abu Dhabi, United Arab Emirates, Dec 09-12, 2024. DOI GitHub

Abstract

Finding meaningful groups, i.e., clusters, in high-dimensional data such as images or texts without labeled data at hand is an important challenge in data mining. In recent years, deep clustering methods have achieved remarkable results in these tasks. However, most of these methods require the user to specify the number of clusters in advance. This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable. Thus, an area of research has emerged that addresses this problem. Most of these approaches estimate the number of clusters separated from the clustering process. This results in a strong dependency of the clustering result on the quality of the initial embedding. Other approaches are tailored to specific clustering processes, making them hard to adapt to other scenarios. In this paper, we propose UNSEEN, a general framework that, starting from a given upper bound, is able to estimate the number of clusters. To the best of our knowledge, it is the first method that can be easily combined with various deep clustering algorithms. We demonstrate the applicability of our approach by combining UNSEEN with the popular deep clustering algorithms DCN, DEC, and DKM and verify its effectiveness through an extensive experimental evaluation on several image and tabular datasets. Moreover, we perform numerous ablations to analyze our approach and show the importance of its components.

MCML Authors

Collin Leiber

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

[1425]

M. Bernhard.
Deep learning methods for image recognition in remote sensing.
Dissertation 2024. DOI

Abstract

In this dissertation, we present solutions to various image recognition problems in remote sensing. Thereby, we harness the characteristics of remote sensing images and address specific challenges coming with remote sensing images. Overall, the methods presented in this dissertation cover the tasks of image classification, object detection, semantic segmentation, and change detection, as well as learning settings with full, incomplete, and noisy supervision. (Shortened).

MCML Authors

Maximilian Bernhard

* Former Member

[1424]

A. Beer, P. Weber, L. Miklautz, C. Leiber, W. Durani, C. Böhm and C. Plant.
SHADE: Deep Density-based Clustering.
ICDM 2024 - 24th IEEE International Conference on Data Mining. Abu Dhabi, United Arab Emirates, Dec 09-12, 2024. DOI

Abstract

Detecting arbitrarily shaped clusters in high-dimensional noisy data is challenging for current clustering methods. We introduce SHADE (Structure-preserving High-dimensional Analysis with Density-based Exploration), the first deep clustering algorithm that incorporates density-connectivity into its loss function. Similar to existing deep clustering algorithms, SHADE supports high-dimensional and large data sets with the expressive power of a deep autoencoder. In contrast to most existing deep clustering methods that rely on a centroid-based clustering objective, SHADE incorporates a novel loss function that captures density-connectivity. SHADE thereby learns a representation that enhances the separation of density-connected clusters. SHADE detects a stable clustering and noise points fully automatically without any user input. It outperforms existing methods in clustering quality, especially on data that contain non-Gaussian clusters, such as video data. Moreover, the embedded space of SHADE is suitable for visualization and interpretation of the clustering results as the individual shapes of the clusters are preserved.

MCML Authors

Anna Beer

Dr.

* Former Member

Collin Leiber

B2 | Natural Language Processing
→ Group Barbara Plank

* Former Member

Christian Böhm

Prof. Dr.

* Former Member

[1423]

V. Basile, S. Casola, S. Frenda and S. M. Lo.
PERSEID - Perspectivist Irony Detection: A CALAMITA Challenge.
CLiC-it 2024 - 10th Italian Conference on Computational Linguistics. Pisa, Italy, Dec 04-06, 2024. URL

Abstract

Works in perspectivism and human label variation have emphasized the need to collect and leverage various voices and points of view in the whole Natural Language Processing pipeline. PERSEID places itself in this line of work. We consider the task of irony detection from short social media conversations in Italian collected from Twitter (X) and Reddit. To do so, we leverage data from MultiPICO, a recent multilingual dataset with disaggregated annotations and annotators’ metadata, containing 1000 Post, Reply pairs with five annotations each on average. We aim to evaluate whether prompting LLMs with additional annotators’ demographic information (namely gender only, age only, and the combination of the two) results in improved performance compared to a baseline in which only the input text is provided. The evaluation is zero-shot; and we evaluate the results on the disaggregated annotations using f1.

MCML Authors

Silvia Casola

Dr.

AI and Computational Linguistics

[1422]

T. Bourgeade, S. Casola, A. M. Wizani and C. Bosco.
Data Augmentation through Back-Translation for Stereotypes and Irony Detection.
CLiC-it 2024 - 10th Italian Conference on Computational Linguistics. Pisa, Italy, Dec 04-06, 2024. URL

Abstract

Complex linguistic phenomena such as stereotypes or irony are still challenging to detect, particularly due to the lower availability of annotated data. In this paper, we explore Back-Translation (BT) as a data augmentation method to enhance such datasets by artificially introducing semantics-preserving variations. We investigate French and Italian as source languages on two multilingual datasets annotated for the presence of stereotypes or irony and evaluate French/Italian, English, and Arabic as pivot languages for the BT process. We also investigate cross-translation, i.e., augmenting one language subset of a multilingual dataset with translated instances from the other languages. We conduct an intrinsic evaluation of the quality of back-translated instances, identifying linguistic or translation model-specific errors that may occur with BT. We also perform an extrinsic evaluation of different data augmentation configurations to train a multilingual Transformer-based classifier for stereotype or irony detection on mono-lingual data.

MCML Authors

Silvia Casola

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[1421]

S. Frenda, A. Piergentili, B. Savoldi, M. Madeddu, M. Rosola, S. Casola, C. Ferrando, V. Patti, M. Negri and L. Bentivogli.
GFG - Gender-Fair Generation: A CALAMITA Challenge.
CLiC-it 2024 - 10th Italian Conference on Computational Linguistics. Pisa, Italy, Dec 04-06, 2024. URL

Abstract

Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities and avoid reinforcing gender stereotypes. Implementing gender-fair strategies is particularly challenging in heavily gender-marked languages, such as Italian. To address this, the Gender-Fair Generation challenge intends to help shift toward gender-fair language in written communication. The challenge, designed to assess and monitor the recognition and generation of gender-fair language in both mono- and cross-lingual scenarios, includes three tasks: (1) the detection of gendered expressions in Italian sentences, (2) the reformulation of gendered expressions into gender-fair alternatives, and (3) the generation of gender-fair language in automatic translation from English to Italian. The challenge relies on three different annotated datasets: the GFL-it corpus, which contains Italian texts extracted from administrative documents provided by the University of Brescia; GeNTE, a bilingual test set for gender-neutral rewriting and translation built upon a subset of the Europarl dataset; and Neo-GATE, a bilingual test set designed to assess the use of non-binary neomorphemes in Italian for both fair formulation and translation tasks. Finally, each task is evaluated with specific metrics: average of F1-score obtained by means of BERTScore computed on each entry of the datasets for task 1, an accuracy measured with a gender-neutral classifier, and a coverage-weighted accuracy for tasks 2 and 3.

MCML Authors

Silvia Casola

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[1420]

A. Triantafyllopoulos and B. W. Schuller.
Hearing aids in the era of foundation models.
GMS Zeitschrift für Audiologie 6.28 (Dec. 2024). DOI

Abstract

The recent introduction of foundation models (FMs) has taken the world by storm. Ranging from large language models (LLMs) to image and audio analysis and generation, FMs have introduced a new paradigm in artificial intelligence (AI), one where practitioners transition from standard supervised machine learning to prompting and in-context learning. This has implications for hearing aid research, and specifically for the use of such models for noise attenuation and speech enhancement. Even though the uptake of FMs is minimal to non-existent for this application domain, mainly due to the prohibitive computational complexity of those models, there are nevertheless ways to benefit from FM advances in an indirect way. We review these approaches in the present contribution.

MCML Authors

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Health Informatics

[1419]

U. Fischer Abaigar, C. Kern, N. Barda and F. Kreuter.
Bridging the gap: Towards an expanded toolkit for AI-driven decision-making in the public sector.
Government Information Quarterly 41.4 (Dec. 2024). DOI

Abstract

AI-driven decision-making systems are becoming instrumental in the public sector, with applications spanning areas like criminal justice, social welfare, financial fraud detection, and public health. While these systems offer great potential benefits to institutional decision-making processes, such as improved efficiency and reliability, these systems face the challenge of aligning machine learning (ML) models with the complex realities of public sector decision-making. In this paper, we examine five key challenges where misalignment can occur, including distribution shifts, label bias, the influence of past decision-making on the data side, as well as competing objectives and human-in-the-loop on the model output side. Our findings suggest that standard ML methods often rely on assumptions that do not fully account for these complexities, potentially leading to unreliable and harmful predictions. To address this, we propose a shift in modeling efforts from focusing solely on predictive accuracy to improving decision-making outcomes. We offer guidance for selecting appropriate modeling frameworks, including counterfactual prediction and policy learning, by considering how the model estimand connects to the decision-maker’s utility. Additionally, we outline technical methods that address specific challenges within each modeling approach. Finally, we argue for the importance of external input from domain experts and stakeholders to ensure that model assumptions and design choices align with real-world policy objectives, taking a step towards harmonizing AI and public sector objectives.

MCML Authors

Unai Fischer Abaigar

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[1418]

T. Hannan, R. Koner, M. Bernhard, S. Shit, B. Menze, V. Tresp, M. Schubert and T. Seidl.
GRAtt-VIS: Gated Residual Attention for Video Instance Segmentation.
ICPR 2024 - 27th International Conference on Pattern Recognition. Kolkata, India, Dec 01-05, 2024. DOI GitHub

Abstract

Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic memory attention. However, they are susceptible to the degradation of instance features due to the above-mentioned challenges and suffer from cascading effects. The detection and rectification of such errors remain largely underexplored. To this end, we introduce textbf{GRAtt-VIS}, textbf{G}ated textbf{R}esidual textbf{Att}ention for textbf{V}ideo textbf{I}nstance textbf{S}egmentation. Firstly, we leverage a Gumbel-Softmax-based gate to detect possible errors in the current frame. Next, based on the gate activation, we rectify degraded features from its past representation. Such a residual configuration alleviates the need for dedicated memory and provides a continuous stream of relevant instance features. Secondly, we propose a novel inter-instance interaction using gate activation as a mask for self-attention. This masking strategy dynamically restricts the unrepresentative instance queries in the self-attention and preserves vital information for long-term tracking. We refer to this novel combination of Gated Residual Connection and Masked Self-Attention as textbf{GRAtt} block, which can easily be integrated into the existing propagation-based framework. Further, GRAtt blocks significantly reduce the attention overhead and simplify dynamic temporal modeling. GRAtt-VIS achieves state-of-the-art performance on YouTube-VIS and the highly challenging OVIS dataset, significantly improving over previous methods.

MCML Authors

Tanveer Hannan

Database Systems and Data Mining

Rajat Koner

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Maximilian Bernhard

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[1417]

Y. N. Böck, H. Boche, F. H. P. Fitzek and G. Kutyniok.
Computing-Model and Computing-Hardware Selection for ICT Under Societal and Judicial Constraints.
IEEE Access 12 (Dec. 2024). DOI

Abstract

This article discusses a formalization of aspects of Cyber-Sovereignty (CyS) for information and communication technology (ICT), linking them to technological trustworthiness and deriving an associated paradigm for hard- and software design. The upcoming 6G ICT standard is considered a keystone within modern society’s increasing interconnectedness and automatization, as it provides the necessary technological infrastructure for applications such as the Metaverse or large-scale digital twinning. Since emerging technological systems increasingly affect sensitive human goods, hard- and software manufacturers must consider a new dimension of societal and judicial constraints in the context of technological trustworthiness. This article aims to establish a formalized theory of specific aspects of CyS, providing a paradigm for hard- and software engineering in ICT. This paradigm is directly applicable in formal technology assessment and ensures that the relevant facets of CyS – specifically, the principle of Algorithmic Transparency (AgT) – are satisfied. The framework follows an axiomatic approach. Particularly, the formal basis of our theory consists of four fundamental assumptions about the general nature of physical problems and algorithmic implementations. This formal basis allows for drawing general conclusions on the relation between CyS and technological trustworthiness and entails a formal meta-thesis on AgT in digital computing.

MCML Authors

Gitta Kutyniok

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Mathematical Foundations of Artificial Intelligence

[1416]

A. Höhl, I. Obadic, M.-Á. Fernández-Torres, H. Najjar, D. Oliveira, Z. Akata, A. Dengel and X. Zhu.
Opening the Black Box: A systematic review on explainable artificial intelligence in remote sensing.
IEEE Geoscience and Remote Sensing Magazine 12.4 (Dec. 2024). DOI

Abstract

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the explainable AI methods used and their objectives, findings, and challenges in remote sensing applications is still missing. In this paper, we address this gap by performing a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches and emerging directions that tackle specific remote sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights, and reflect on the approaches used for the evaluation of explainable AI methods. As such, our review provides a complete summary of the state-of-the-art of explainable AI in remote sensing. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field.

MCML Authors

Adrian Höhl

Data Science in Earth Observation

Ivica Obadic

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[1415]

S. Zhao, Z. Chen, Z. Xiong, Y. Shi, S. Saha and X. Zhu.
Beyond Grid Data: Exploring graph neural networks for Earth observation.
IEEE Geoscience and Remote Sensing Magazine Early Access (Dec. 2024). DOI

Abstract

Earth Observation (EO) data analysis has been significantly revolutionized by deep learning (DL), with applications typically limited to grid-like data structures. Graph Neural Networks (GNNs) emerge as an important innovation, propelling DL into the non-Euclidean domain. Naturally, GNNs can effectively tackle the challenges posed by diverse modalities, multiple sensors, and the heterogeneous nature of EO data. To introduce GNNs in the related domains, our review begins by offering fundamental knowledge on GNNs. Then, we summarize the generic problems in EO, to which GNNs can offer potential solutions. Following this, we explore a broad spectrum of GNNs’ applications to scientific problems in Earth systems, covering areas such as weather and climate analysis, disaster management, air quality monitoring, agriculture, land cover classification, hydrological process modeling, and urban modeling. The rationale behind adopting GNNs in these fields is explained, alongside methodologies for organizing graphs and designing favorable architectures for various tasks. Furthermore, we highlight methodological challenges of implementing GNNs in these domains and possible solutions that could guide future research. While acknowledging that GNNs are not a universal solution, we conclude the paper by comparing them with other popular architectures like transformers and analyzing their potential synergies.

MCML Authors

Zhaiyu Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1414]

N. Saberi, M. H. Shaker, C. R. Duguay, K. A. Scott and E. Hüllermeier.
Uncertainty Estimation of Lake Ice Cover Maps From a Random Forest Classifier Using MODIS TOA Reflectance Data.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18 (Dec. 2024). DOI

Abstract

This article presents a method to improve the usability of lake ice cover (LIC) maps generated from moderate resolution imaging spectroradiometer (MODIS) top-of-atmosphere reflectance data by providing estimates of aleatoric and epistemic uncertainty. We used a random forest (RF) classifier, which has been shown to have superior performance in classifying lake ice, open water, and clouds, to generate daily LIC maps with inherent (aleatoric) and model (epistemic) uncertainties. RF allows for the learning of different hypotheses (trees), producing diverse predictions that can be utilized to quantify aleatoric and epistemic uncertainty. We use a decomposition of Shannon entropy to quantify these uncertainties and apply pixel-based uncertainty estimation. Our results show that using uncertainty values to reject the classification of uncertain pixels significantly improves recall and precision. The method presented herein is under consideration for integration into the processing chain implemented for the production of daily LIC maps as part of the European Space Agency’s Climate Change Initiative (CCI+) Lakes project.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1413]

Q. Sun, A. Akman, X. Jing, M. Milling and B. W. Schuller.
Audio-based Kinship Verification Using Age Domain Conversion.
IEEE Signal Processing Letters 32 (Dec. 2024). DOI

Abstract

Audio-based kinship verification (AKV) is important in many domains, such as home security monitoring, forensic identification, and social network analysis. A key challenge in the task arises from differences in age across samples from different individuals, which can be interpreted as a domain bias in a cross-domain verification task. To address this issue, we design the notion of an ‘age-standardised domain’ wherein we utilise the optimised CycleGAN-VC3 network to perform age-audio conversion to generate the in-domain audio. The generated audio dataset is employed to extract a range of features, which are then fed into a metric learning architecture to verify kinship. Experiments are conducted on the KAN_AV audio dataset, which contains age and kinship labels. The results demonstrate that the method markedly enhances the accuracy of kinship verification, while also offering novel insights for future kinship verification research.

MCML Authors

Xin Jing

Health Informatics

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1412]

L. Shen, H. Zhang, C. Zhu, R. Li, K. Qian, W. Meng, F. Tian, B. Hu, B. W. Schuller and Y. Yamamoto.
A First Look at Generative Artificial Intelligence Based Music Therapy for Mental Disorders.
IEEE Transactions on Consumer Electronics Early Access (Dec. 2024). DOI

Abstract

Mental disorders show a rapid increase and cause considerable harm to individuals as well as the society in recent decade. Hence, mental disorders have become a serious public health challenge in nowadays society. Timely treatment of mental disorders plays a critical role for reducing the harm of mental illness to individuals and society. Music therapy is a type of non-pharmaceutical method in treating such mental disorders. However, conventional music therapy suffers from a number of issues resulting in a lack of popularity. Thanks to the rapid development of Artificial Intelligence (AI), especially the AI Generated Content (AIGC), it provides a chance to address these issues. Nevertheless, to the best of our knowledge, there is no work investigating music therapy from AIGC and closed-loop perspective. In this paper, we summarise some universal music therapy methods and discuss their shortages. Then, we indicate some AIGC techniques, especially the music generation, for their application in music therapy. Moreover, we present a closed-loop music therapy system and introduce its implementation details. Finally, we discuss some challenges in AIGC-based music therapy with proposing further research direction, and we suggest the potential of this system to become a consumer-grade product for treating mental disorders.

MCML Authors

Björn Schuller

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Health Informatics

[1411]

Z. Chen, Y. Shi, L. Nan, Z. Xiong and X. Zhu.
PolyGNN: Polyhedron-based graph neural network for 3D building reconstruction from point clouds.
ISPRS Journal of Photogrammetry and Remote Sensing 218.A (Dec. 2024). DOI GitHub

Abstract

We present PolyGNN, a polyhedron-based graph neural network for 3D building reconstruction from point clouds. PolyGNN learns to assemble primitives obtained by polyhedral decomposition via graph node classification, achieving a watertight and compact reconstruction. To effectively represent arbitrary-shaped polyhedra in the neural network, we propose a skeleton-based sampling strategy to generate polyhedron-wise queries. These queries are then incorporated with inter-polyhedron adjacency to enhance the classification. PolyGNN is end-to-end optimizable and is designed to accommodate variable-size input points, polyhedra, and queries with an index-driven batching technique. To address the abstraction gap between existing city-building models and the underlying instances, and provide a fair evaluation of the proposed method, we develop our method on a large-scale synthetic dataset with well-defined ground truths of polyhedral labels. We further conduct a transferability analysis across cities and on real-world point clouds. Both qualitative and quantitative results demonstrate the effectiveness of our method, particularly its efficiency for large-scale reconstructions.

MCML Authors

Zhaiyu Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Data Science in Earth Observation

[1410]

J. Herbinger, M. N. Wright, T. Nagler, B. Bischl and G. Casalicchio.
Decomposing Global Feature Effects Based on Feature Interactions.
Journal of Machine Learning Research 25.381 (Dec. 2024). URL

Abstract

Global feature effect methods, such as partial dependence plots, provide an intelligible visualization of the expected marginal feature effect. However, such global feature effect methods can be misleading, as they do not represent local feature effects of single observations well when feature interactions are present. We formally introduce generalized additive decomposition of global effects (GADGET), which is a new framework based on recursive partitioning to find interpretable regions in the feature space such that the interaction-related heterogeneity of local feature effects is minimized. We provide a mathematical foundation of the framework and show that it is applicable to the most popular methods to visualize marginal feature effects, namely partial dependence, accumulated local effects, and Shapley additive explanations (SHAP) dependence. Furthermore, we introduce and validate a new permutation-based interaction detection procedure that is applicable to any feature effect method that fits into our proposed framework. We empirically evaluate the theoretical characteristics of the proposed methods based on various feature effect methods in different experimental settings. Moreover, we apply our introduced methodology to three real-world examples to showcase their usefulness.

MCML Authors

Julia Herbinger

Dr.

* Former Member

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[1409]

H. Weingärtner, M. Windl, L. L. Chuang and F. Draxler.
Useful but Distracting: Viewer Experience with Keyword Highlights and Time-Synchronization in Captions for Language Learning.
MUM 2024 - 23rd International Conference on Mobile and Ubiquitous Multimedia. Stockholm, Sweden, Dec 01-04, 2024. DOI

Abstract

Captions are a valuable scaffold for language learners, aiding comprehension and vocabulary acquisition. Past work has proposed enhancements such as keyword highlights for increased learning gains. However, little is known about learners’ experience with enhanced captions, although this is critical for adoption in everyday life. We conducted a survey and focus group to elicit learner preferences and requirements and implemented a processing pipeline for enhanced captions with keyword highlights, time-synchronized keyword highlights, and keyword captions. A subsequent online study (n = 66) showed that time-synchronized keyword highlights were the preferred design for learning but were perceived as too distracting to replace standard captions in everyday viewing scenarios. We conclude that keyword highlights and time-synchronization are suitable for integrating learning into an entertaining everyday- life activity, but the design should be optimized to provide a more seamless experience.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[1408]

L. B. Kuemmerle, M. D. Luecken, A. B. Firsova, L. Barros de Andrade e Sousa, L. Straßer, I. I. Mekki, F. Campi, L. Heumos, M. Shulman, V. Beliaeva, S. Hediyeh-Zadeh, A. C. Schaar, K. T. Mahbubani, A. Sountoulidis, T. Balassa, F. Kovacs, P. Horvath, M. Piraud, A. Ertürk, C. Samakovlis and F. J. Theis.
Probe set selection for targeted spatial transcriptomics.
Nature Methods 21 (Dec. 2024). DOI

Abstract

Targeted spatial transcriptomic methods capture the topology of cell types and states in tissues at single-cell and subcellular resolution by measuring the expression of a predefined set of genes. The selection of an optimal set of probed genes is crucial for capturing the spatial signals present in a tissue. This requires selecting the most informative, yet minimal, set of genes to profile (gene set selection) for which it is possible to build probes (probe design). However, current selections often rely on marker genes, precluding them from detecting continuous spatial signals or new states. We present Spapros, an end-to-end probe set selection pipeline that optimizes both gene set specificity for cell type identification and within-cell type expression variation to resolve spatially distinct populations while considering prior knowledge as well as probe design and expression constraints. We evaluated Spapros and show that it outperforms other selection approaches in both cell type recovery and recovering expression variation beyond cell types. Furthermore, we used Spapros to design a single-cell resolution in situ hybridization on tissues (SCRINSHOT) experiment of adult lung tissue to demonstrate how probes selected with Spapros identify cell types of interest and detect spatial variation even within cell types.

MCML Authors

Fabian Theis

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Mathematical Modelling of Biological Systems

[1407]

J. Senoner, S. Schallmoser, B. Kratzwald, S. Feuerriegel and T. Netland.
Explainable AI improves task performance in human–AI collaboration.
Scientific Reports 14.31150 (Dec. 2024). DOI

Abstract

Artificial intelligence (AI) provides considerable opportunities to assist human work. However, one crucial challenge of human-AI collaboration is that many AI algorithms operate in a black-box manner where the way how the AI makes predictions remains opaque. This makes it difficult for humans to validate a prediction made by AI against their own domain knowledge. For this reason, we hypothesize that augmenting humans with explainable AI as a decision aid improves task performance in human-AI collaboration. To test this hypothesis, we analyze the effect of augmenting domain experts with explainable AI in the form of visual heatmaps. We then compare participants that were either supported by (a) black-box AI or (b) explainable AI, where the latter supports them to follow AI predictions when the AI is accurate or overrule the AI when the AI predictions are wrong. We conducted two preregistered experiments with representative, real-world visual inspection tasks from manufacturing and medicine. The first experiment was conducted with factory workers from an electronics factory, who performed N=9,600 assessments of whether electronic products have defects. The second experiment was conducted with radiologists, who performed N=5,650 assessments of chest X-ray images to identify lung lesions. The results of our experiments with domain experts performing real-world tasks show that task performance improves when participants are supported by explainable AI instead of black-box AI. For example, in the manufacturing setting, we find that augmenting participants with explainable AI (as opposed to black-box AI) leads to a five-fold decrease in the median error rate of human decisions, which gives a significant improvement in task performance.

MCML Authors

Simon Schallmoser

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Artificial Intelligence in Management

[1406]

M. Kollovieh, L. Gosch, M. Lienen, Y. Scholten, L. Schwinn and S. Günnemann.
Assessing Robustness via Score-Based Adversarial Image Generation.
Transactions on Machine Learning Research (Dec. 2024). URL

Abstract

Most adversarial attacks and defenses focus on perturbations within small -norm constraints. However, threat models cannot capture all relevant semantics-preserving perturbations, and hence, the scope of robustness evaluations is limited. In this work, we introduce Score-Based Adversarial Generation (ScoreAG), a novel framework that leverages the advancements in score-based generative models to generate unrestricted adversarial examples that overcome the limitations of -norm constraints. Unlike traditional methods, ScoreAG maintains the core semantics of images while generating adversarial examples, either by transforming existing images or synthesizing new ones entirely from scratch. We further exploit the generative capability of ScoreAG to purify images, empirically enhancing the robustness of classifiers. Our extensive empirical evaluation demonstrates that ScoreAG improves upon the majority of state-of-the-art attacks and defenses across multiple benchmarks. This work highlights the importance of investigating adversarial examples bounded by semantics rather than -norm constraints. ScoreAG represents an important step towards more encompassing robustness assessments.

MCML Authors

Marcel Kollovieh

Data Analytics & Machine Learning

Lukas Gosch

A3 | Computational Models
→ Group Stephan Günnemann

Data Analytics & Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[1405]

A. Baumann, R. Li, M. Klasson, S. Mentu, S. Karthik, Z. Akata, A. Solin and M. Trapp.
Post-hoc Probabilistic Vision-Language Models.
Preprint (Dec. 2024). arXiv

Abstract

Vision-language models (VLMs), such as CLIP and SigLIP, have found remarkable success in classification, retrieval, and generative tasks. For this, VLMs deterministically map images and text descriptions to a joint latent space in which their similarity is assessed using the cosine similarity. However, a deterministic mapping of inputs fails to capture uncertainties over concepts arising from domain shifts when used in downstream tasks. In this work, we propose post-hoc uncertainty estimation in VLMs that does not require additional training. Our method leverages a Bayesian posterior approximation over the last layers in VLMs and analytically quantifies uncertainties over cosine similarities. We demonstrate its effectiveness for uncertainty quantification and support set selection in active learning. Compared to baselines, we obtain improved and well-calibrated predictive uncertainties, interpretable uncertainty estimates, and sample-efficient active learning. Our results show promise for safety-critical applications of large-scale models.

MCML Authors

Shyamgopal Karthik

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1404]

M. Fischer, P. Neher, P. J. Schüffler, S. Ziegler, S. Xiao, R. Peretzke, D. Clunie, C. Ulrich, M. Baumgartner, A. Muckenhuber, S. Dias Almeida, M. Götz, J. Kleesiek, M. Nolden, R. Braren and K. Maier-Hein.
Unlocking the Potential of Digital Pathology: Novel Baselines for Compression.
Preprint (Dec. 2024). arXiv

Abstract

Digital pathology offers a groundbreaking opportunity to transform clinical practice in histopathological image analysis, yet faces a significant hurdle: the substantial file sizes of pathological Whole Slide Images (WSI). While current digital pathology solutions rely on lossy JPEG compression to address this issue, lossy compression can introduce color and texture disparities, potentially impacting clinical decision-making. While prior research addresses perceptual image quality and downstream performance independently of each other, we jointly evaluate compression schemes for perceptual and downstream task quality on four different datasets. In addition, we collect an initially uncompressed dataset for an unbiased perceptual evaluation of compression schemes. Our results show that deep learning models fine-tuned for perceptual quality outperform conventional compression schemes like JPEG-XL or WebP for further compression of WSI. However, they exhibit a significant bias towards the compression artifacts present in the training data and struggle to generalize across various compression schemes. We introduce a novel evaluation metric based on feature similarity between original files and compressed files that aligns very well with the actual downstream performance on the compressed WSI. Our metric allows for a general and standardized evaluation of lossy compression schemes and mitigates the requirement to independently assess different downstream tasks. Our study provides novel insights for the assessment of lossy compression schemes for WSI and encourages a unified evaluation of lossy compression schemes to accelerate the clinical uptake of digital pathology.

MCML Authors

Peter Schüffler

Prof. Dr.

Computational Pathology

[1403]

F. Fumagalli, M. Muschalik, E. Hüllermeier, B. Hammer and J. Herbinger.
Unifying Feature-Based Explanations with Functional ANOVA and Cooperative Game Theory.
Preprint (Dec. 2024). arXiv

Abstract

Feature-based explanations, using perturbations or gradients, are a prevalent tool to understand decisions of black box machine learning models. Yet, differences between these methods still remain mostly unknown, which limits their applicability for practitioners. In this work, we introduce a unified framework for local and global feature-based explanations using two well-established concepts: functional ANOVA (fANOVA) from statistics, and the notion of value and interaction from cooperative game theory. We introduce three fANOVA decompositions that determine the influence of feature distributions, and use game-theoretic measures, such as the Shapley value and interactions, to specify the influence of higher-order interactions. Our framework combines these two dimensions to uncover similarities and differences between a wide range of explanation techniques for features and groups of features. We then empirically showcase the usefulness of our framework on synthetic and real-world datasets.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Julia Herbinger

Dr.

* Former Member

[1402]

J. Hingerl, A. Karollus and J. Gagneur.
Flashzoi: An enhanced Borzoi model for accelerated genomic analysis.
Preprint (Dec. 2024). DOI

Abstract

Accurately predicting how DNA sequence drives gene regulation and how genetic variants alter gene expression is a central challenge in genomics. Borzoi, which models over ten thousand genomic assays including RNA-seq coverage from over half a megabase of sequence context alone promises to become an important foundation model in regulatory genomics, both for massively annotating variants and for further model development. However, its reliance on handcrafted, relative positional encodings within the transformer architecture limits its computational efficiency. Here we present Flashzoi, an enhanced Borzoi model that leverages rotary positional encodings and FlashAttention-2. This achieves over 3-fold faster training and inference and up to 2.4-fold reduced memory usage, while maintaining or improving accuracy in modeling various genomic assays including RNA-seq coverage, predicting variant effects, and enhancer-promoter linking. Flashzoi{textquoteright}s improved efficiency facilitates large-scale genomic analyses and opens avenues for exploring more complex regulatory mechanisms and modeling.Competing Interest StatementThe authors have declared no competing interest.

MCML Authors

Johannes Hingerl

Computational Molecular Medicine

Alexander Karollus

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Daniel Grün

Computational Molecular Medicine

[1401]

J. Homer, O. Friedrich and D. Grün.
Simulation-based inference has its own Dodelson-Schneider effect (but it knows that it does).
Preprint (Dec. 2024). arXiv

Abstract

Making inferences about physical properties of the Universe requires knowledge of the data likelihood. A Gaussian distribution is commonly assumed for the uncertainties with a covariance matrix estimated from a set of simulations. The noise in such covariance estimates causes two problems: it distorts the width of the parameter contours, and it adds scatter to the location of those contours which is not captured by the widths themselves. For non-Gaussian likelihoods, an approximation may be derived via Simulation-Based Inference (SBI). It is often implicitly assumed that parameter constraints from SBI analyses, which do not use covariance matrices, are not affected by the same problems as parameter estimation with a covariance matrix estimated from simulations. We investigate whether SBI suffers from effects similar to those of covariance estimation in Gaussian likelihoods. We use Neural Posterior and Likelihood Estimation with continuous and masked autoregressive normalizing flows for density estimation. We fit our approximate posterior models to simulations drawn from a Gaussian linear model, so that the SBI result can be compared to the true posterior. We test linear and neural network based compression, demonstrating that neither methods circumvent the issues of covariance estimation. SBI suffers an inflation of posterior variance that is equal or greater than the analytical result in covariance estimation for Gaussian likelihoods for the same number of simulations. The assumption that SBI requires a smaller number of simulations than covariance estimation for a Gaussian likelihood analysis is inaccurate. The limitations of traditional likelihood analysis with simulation-based covariance remain for SBI with a finite simulation budget. Despite these issues, we show that SBI correctly draws the true posterior contour given enough simulations.

MCML Authors

Jed Homer

Astrophysics, Cosmology and Artificial Intelligence

Daniel Grün

Prof. Dr.

Astrophysics, Cosmology and Artificial Intelligence

[1400]

V. T. Hu and B. Ommer.
[MASK] is All You Need.
Preprint (Dec. 2024). arXiv

Abstract

In generative models, two paradigms have gained attraction in various applications: next-set prediction-based Masked Generative Models and next-noise prediction-based Non-Autoregressive Models, e.g., Diffusion Models. In this work, we propose using discrete-state models to connect them and explore their scalability in the vision domain. First, we conduct a step-by-step analysis in a unified design space across two types of models including timestep-independence, noise schedule, temperature, guidance strength, etc in a scalable manner. Second, we re-cast typical discriminative tasks, e.g., image segmentation, as an unmasking process from [MASK] tokens on a discrete-state model. This enables us to perform various sampling processes, including flexible conditional sampling by only training once to model the joint distribution. All aforementioned explorations lead to our framework named Discrete Interpolants, which enables us to achieve state-of-the-art or competitive performance compared to previous discrete-state based methods in various benchmarks, like ImageNet256, MS COCO, and video dataset FaceForensics. In summary, by leveraging [MASK] in discrete-state models, we can bridge Masked Generative and Non-autoregressive Diffusion models, as well as generative and discriminative tasks.

MCML Authors

Vincent Tao Hu

Dr.

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1399]

Y. Li, M. Milling, L. Specia and B. W. Schuller.
From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview.
Preprint (Dec. 2024). arXiv

Abstract

As Artificial Intelligence (AI) technologies continue to evolve, their use in generating realistic, contextually appropriate content has expanded into various domains. Music, an art form and medium for entertainment, deeply rooted into human culture, is seeing an increased involvement of AI into its production. However, despite the effective application of AI music generation (AIGM) tools, the unregulated use of them raises concerns about potential negative impacts on the music industry, copyright and artistic integrity, underscoring the importance of effective AIGM detection. This paper provides an overview of existing AIGM detection methods. To lay a foundation to the general workings and challenges of AIGM detection, we first review general principles of AIGM, including recent advancements in deepfake audios, as well as multimodal detection techniques. We further propose a potential pathway for leveraging foundation models from audio deepfake detection to AIGM detection. Additionally, we discuss implications of these tools and propose directions for future research to address ongoing challenges in the field.

MCML Authors

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1398]

S. Liang, S. Wang, K. Li, M. Niemeyer, S. Gasperini, N. Navab and F. Tombari.
SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians.
Preprint (Dec. 2024). arXiv

Abstract

3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While the vanilla Gaussian Splatting representation is mainly designed for view synthesis, more recent works investigated how to extend it with scene understanding and language features. However, existing methods lack a detailed comprehension of scenes, limiting their ability to segment and interpret complex structures. To this end, We introduce SuperGSeg, a novel approach that fosters cohesive, context-aware scene representation by disentangling segmentation and language field distillation. SuperGSeg first employs neural Gaussians to learn instance and hierarchical segmentation features from multi-view images with the aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse set of what we call Super-Gaussians. Super-Gaussians facilitate the distillation of 2D language features into 3D space. Through Super-Gaussians, our method enables high-dimensional language feature rendering without extreme increases in GPU memory. Extensive experiments demonstrate that SuperGSeg outperforms prior works on both open-vocabulary object localization and semantic segmentation tasks.

MCML Authors

Sen Wang

Computer Aided Medical Procedures & Augmented Reality

Kunyi Li

Computer Aided Medical Procedures & Augmented Reality

Stefano Gasperini

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Computer Aided Medical Procedures & Augmented Reality

[1397]

Y. Mansour and R. Heckel.
Measuring Bias of Web-filtered Text Datasets and Bias Propagation Through Training.
Preprint (Dec. 2024). arXiv

Abstract

We investigate biases in pretraining datasets for large language models (LLMs) through dataset classification experiments. Building on prior work demonstrating the existence of biases in popular computer vision datasets, we analyze popular open-source pretraining datasets for LLMs derived from CommonCrawl including C4, RefinedWeb, DolmaCC, RedPajama-V2, FineWeb, and DCLM-Baseline. Despite those datasets being obtained with similar filtering and deduplication steps, neural networks can classify surprisingly well which dataset a single text sequence belongs to, significantly better than a human can. This indicates that popular pretraining datasets have their own unique biases or fingerprints. Those biases remain even when the text is rewritten with LLMs. Moreover, these biases propagate through training: Random sequences generated by models trained on those datasets can be classified well by a classifier trained on the original datasets.

MCML Authors

Youssef Mansour

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

[1396]

A. Reithmeir, V. Spieker, V. Sideri-Lampretsa, D. Rückert, J. A. Schnabel and V. A. Zimmer.
From Model Based to Learned Regularization in Medical Image Registration: A Comprehensive Review.
Preprint (Dec. 2024). arXiv

Abstract

Image registration is fundamental in medical imaging applications, such as disease progression analysis or radiation therapy planning. The primary objective of image registration is to precisely capture the deformation between two or more images, typically achieved by minimizing an optimization problem. Due to its inherent ill-posedness, regularization is a key component in driving the solution toward anatomically meaningful deformations. A wide range of regularization methods has been proposed for both conventional and deep learning-based registration. However, the appropriate application of regularization techniques often depends on the specific registration problem, and no one-fits-all method exists. Despite its importance, regularization is often overlooked or addressed with default approaches, assuming existing methods are sufficient. A comprehensive and structured review remains missing. This review addresses this gap by introducing a novel taxonomy that systematically categorizes the diverse range of proposed regularization methods. It highlights the emerging field of learned regularization, which leverages data-driven techniques to automatically derive deformation properties from the data. Moreover, this review examines the transfer of regularization methods from conventional to learning-based registration, identifies open challenges, and outlines future research directions. By emphasizing the critical role of regularization in image registration, we hope to inspire the research community to reconsider regularization strategies in modern registration algorithms and to explore this rapidly evolving field further.

MCML Authors

Anna Reithmeir

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Computational Imaging and AI in Medicine

[1395]

C. Sauer, A.-L. Boulesteix, L. Hanßum, F. Hodiamont, C. Bausewein and T. Ullmann.
Beyond algorithm hyperparameters: on preprocessing hyperparameters and associated pitfalls in machine learning applications.
Preprint (Dec. 2024). arXiv

Abstract

Adequately generating and evaluating prediction models based on supervised machine learning (ML) is often challenging, especially for less experienced users in applied research areas. Special attention is required in settings where the model generation process involves hyperparameter tuning, i.e. data-driven optimization of different types of hyperparameters to improve the predictive performance of the resulting model. Discussions about tuning typically focus on the hyperparameters of the ML algorithm (e.g., the minimum number of observations in each terminal node for a tree-based algorithm). In this context, it is often neglected that hyperparameters also exist for the preprocessing steps that are applied to the data before it is provided to the algorithm (e.g., how to handle missing feature values in the data). As a consequence, users experimenting with different preprocessing options to improve model performance may be unaware that this constitutes a form of hyperparameter tuning - albeit informal and unsystematic - and thus may fail to report or account for this optimization. To illuminate this issue, this paper reviews and empirically illustrates different procedures for generating and evaluating prediction models, explicitly addressing the different ways algorithm and preprocessing hyperparameters are typically handled by applied ML users. By highlighting potential pitfalls, especially those that may lead to exaggerated performance claims, this review aims to further improve the quality of predictive modeling in ML applications.

MCML Authors

Christina Sauer (née Nießl)

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Theresa Ullmann

Dr.

* Former Member

[1394]

Q. Sun, Y. Li, E. Alturki, S. M. K. Murthy and B. W. Schuller.
Towards Friendly AI: A Comprehensive Review and New Perspectives on Human-AI Alignment.
Preprint (Dec. 2024). arXiv

Abstract

As Artificial Intelligence (AI) continues to advance rapidly, Friendly AI (FAI) has been proposed to advocate for more equitable and fair development of AI. Despite its importance, there is a lack of comprehensive reviews examining FAI from an ethical perspective, as well as limited discussion on its potential applications and future directions. This paper addresses these gaps by providing a thorough review of FAI, focusing on theoretical perspectives both for and against its development, and presenting a formal definition in a clear and accessible format. Key applications are discussed from the perspectives of eXplainable AI (XAI), privacy, fairness and affective computing (AC). Additionally, the paper identifies challenges in current technological advancements and explores future research avenues. The findings emphasise the significance of developing FAI and advocate for its continued advancement to ensure ethical and beneficial AI development.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1393]

A. Testoni, B. Plank and R. Fernández.
RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs.
Preprint (Dec. 2024). arXiv

Abstract

Ambiguity resolution is key to effective communication. While humans effortlessly address ambiguity through conversational grounding strategies, the extent to which current language models can emulate these strategies remains unclear. In this work, we examine referential ambiguity in image-based question answering by introducing RACQUET, a carefully curated dataset targeting distinct aspects of ambiguity. Through a series of evaluations, we reveal significant limitations and problems of overconfidence of state-of-the-art large multimodal language models in addressing ambiguity in their responses. The overconfidence issue becomes particularly relevant for RACQUET-BIAS, a subset designed to analyze a critical yet underexplored problem: failing to address ambiguity leads to stereotypical, socially biased responses. Our results underscore the urgency of equipping models with robust strategies to deal with uncertainty without resorting to undesirable stereotypes.

MCML Authors

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1392]

J. Wang, Z. Qin, Y. Zhang, V. T. Hu, B. Ommer, R. Briq and S. Kesselheim.
Scaling Image Tokenizers with Grouped Spherical Quantization.
Preprint (Dec. 2024). arXiv

Abstract

Vision tokenizers have gained a lot of attraction due to their scalability and compactness; previous works depend on old-school GAN-based hyperparameters, biased comparisons, and a lack of comprehensive analysis of the scaling behaviours. To tackle those issues, we introduce Grouped Spherical Quantization (GSQ), featuring spherical codebook initialization and lookup regularization to constrain codebook latent to a spherical surface. Our empirical analysis of image tokenizer training strategies demonstrates that GSQ-GAN achieves superior reconstruction quality over state-of-the-art methods with fewer training iterations, providing a solid foundation for scaling studies. Building on this, we systematically examine the scaling behaviours of GSQ, specifically in latent dimensionality, codebook size, and compression ratios, and their impact on model performance. Our findings reveal distinct behaviours at high and low spatial compression levels, underscoring challenges in representing high-dimensional latent spaces. We show that GSQ can restructure high-dimensional latent into compact, low-dimensional spaces, thus enabling efficient scaling with improved quality. As a result, GSQ-GAN achieves a 16x down-sampling with a reconstruction FID (rFID) of 0.50.

MCML Authors

Vincent Tao Hu

Dr.

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1391]

Y. Wang, Q. Song, D. Wasif, M. Shahzad, C. Koller, J. Bamber and X. Zhu.
How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning.
Preprint (Dec. 2024). arXiv GitHub

Abstract

Uncertainty quantification (UQ) is essential for assessing the reliability of Earth observation (EO) products. However, the extensive use of machine learning models in EO introduces an additional layer of complexity, as those models themselves are inherently uncertain. While various UQ methods do exist for machine learning models, their performance on EO datasets remains largely unevaluated. A key challenge in the community is the absence of the ground truth for uncertainty, i.e. how certain the uncertainty estimates are, apart from the labels for the image/signal. This article fills this gap by introducing three benchmark datasets specifically designed for UQ in EO machine learning models. These datasets address three common problem types in EO: regression, image segmentation, and scene classification. They enable a transparent comparison of different UQ methods for EO machine learning models. We describe the creation and characteristics of each dataset, including data sources, preprocessing steps, and label generation, with a particular focus on calculating the reference uncertainty. We also showcase baseline performance of several machine learning models on each dataset, highlighting the utility of these benchmarks for model development and comparison. Overall, this article offers a valuable resource for researchers and practitioners working in artificial intelligence for EO, promoting a more accurate and reliable quality measure of the outputs of machine learning models.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1390]

J. Weidner, M. Balcerak, I. Ezhov, A. Datchev, L. Lux, L. Zimmer, D. Rückert, B. Menze and B. Wiestler.
Spatial Brain Tumor Concentration Estimation for Individualized Radiotherapy Planning.
Preprint (Dec. 2024). arXiv

Abstract

Biophysical modeling of brain tumors has emerged as a promising strategy for personalizing radiotherapy planning by estimating the otherwise hidden distribution of tumor cells within the brain. However, many existing state-of-the-art methods are computationally intensive, limiting their widespread translation into clinical practice. In this work, we propose an efficient and direct method that utilizes soft physical constraints to estimate the tumor cell concentration from preoperative MRI of brain tumor patients. Our approach optimizes a 3D tumor concentration field by simultaneously minimizing the difference between the observed MRI and a physically informed loss function. Compared to existing state-of-the-art techniques, our method significantly improves predicting tumor recurrence on two public datasets with a total of 192 patients while maintaining a clinically viable runtime of under one minute - a substantial reduction from the 30 minutes required by the current best approach. Furthermore, we showcase the generalizability of our framework by incorporating additional imaging information and physical constraints, highlighting its potential to translate to various medical diffusion phenomena with imperfect data.

MCML Authors

Jonas Weidner

AI for Image-Guided Diagnosis and Therapy

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

[1389]

Y. Xia, Z. Li, Y.-J. Li, L. Shi, H. Cao, J. F. H. João F. Henriques and D. Cremers.
UniLoc: Towards Universal Place Recognition Using Any Single Modality.
Preprint (Dec. 2024). arXiv GitHub

Abstract

To date, most place recognition methods focus on single-modality retrieval. While they perform well in specific environments, cross-modal methods offer greater flexibility by allowing seamless switching between map and query sources. It also promises to reduce computation requirements by having a unified model, and achieving greater sample efficiency by sharing parameters. In this work, we develop a universal solution to place recognition, UniLoc, that works with any single query modality (natural language, image, or point cloud). UniLoc leverages recent advances in large-scale contrastive learning, and learns by matching hierarchically at two levels: instance-level matching and scene-level matching. Specifically, we propose a novel Self-Attention based Pooling (SAP) module to evaluate the importance of instance descriptors when aggregated into a place-level descriptor. Experiments on the KITTI-360 dataset demonstrate the benefits of cross-modality for place recognition, achieving superior performance in cross-modal settings and competitive results also for uni-modal scenarios.

MCML Authors

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1388]

Y. Xia, Y. Lu, R. Song, O. Dhaouadi, J. F. Henriques and D. Cremers.
TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes.
Preprint (Dec. 2024). arXiv GitHub

Abstract

We tackle the problem of localizing the traffic surveillance cameras in cooperative perception. To overcome the lack of large-scale real-world intersection datasets, we introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla. Moreover, we introduce a novel neural network, TrafficLoc, localizing traffic cameras within a 3D reference map. TrafficLoc employs a coarse-to-fine matching pipeline. For image-point cloud feature fusion, we propose a novel Geometry-guided Attention Loss to address cross-modal viewpoint inconsistencies. During coarse matching, we propose an Inter-Intra Contrastive Learning to achieve precise alignment while preserving distinctiveness among local intra-features within image patch-point group pairs. Besides, we introduce Dense Training Alignment with a soft-argmax operator to consider additional features when regressing the final position. Extensive experiments show that our TrafficLoc improves the localization accuracy over the state-of-the-art Image-to-point cloud registration methods by a large margin (up to 86%) on Carla Intersection and generalizes well to real-world data. TrafficLoc also achieves new SOTA performance on KITTI and NuScenes datasets, demonstrating strong localization ability across both in-vehicle and traffic cameras.

MCML Authors

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1387]

X. Xue, G. Wei, H. Chen, H. Zhang, F. Lin, C. Shen and X. Zhu.
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation.
Preprint (Dec. 2024). arXiv

Abstract

The rapid evolution of Vision Language Models (VLMs) has catalyzed significant advancements in artificial intelligence, expanding research across various disciplines, including Earth Observation (EO). While VLMs have enhanced image understanding and data processing within EO, their applications have predominantly focused on image content description. This limited focus overlooks their potential in geographic and scientific regression tasks, which are essential for diverse EO applications. To bridge this gap, this paper introduces a novel benchmark dataset, called textbf{REO-Instruct} to unify regression and generation tasks specifically for the EO domain. Comprising 1.6 million multimodal EO imagery and language pairs, this dataset is designed to support both biomass regression and image content interpretation tasks. Leveraging this dataset, we develop REO-VLM, a groundbreaking model that seamlessly integrates regression capabilities with traditional generative functions. By utilizing language-driven reasoning to incorporate scientific domain knowledge, REO-VLM goes beyond solely relying on EO imagery, enabling comprehensive interpretation of complex scientific attributes from EO data. This approach establishes new performance benchmarks and significantly enhances the capabilities of environmental monitoring and resource management.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Data Science in Earth Observation

[1386]

H. Ye, A. Wisiorek, A. Maronikolakis, Ö. Alaçam and H. Schütze.
A Federated Approach to Few-Shot Hate Speech Detection for Marginalized Communities.
Preprint (Dec. 2024). arXiv

Abstract

Hate speech online remains an understudied issue for marginalized communities, and has seen rising relevance, especially in the Global South, which includes developing societies with increasing internet penetration. In this paper, we aim to provide marginalized communities living in societies where the dominant language is low-resource with a privacy-preserving tool to protect themselves from hate speech on the internet by filtering offensive content in their native languages. Our contribution in this paper is twofold: 1) we release REACT (REsponsive hate speech datasets Across ConTexts), a collection of high-quality, culture-specific hate speech detection datasets comprising seven distinct target groups in eight low-resource languages, curated by experienced data collectors; 2) we propose a solution to few-shot hate speech detection utilizing federated learning (FL), a privacy-preserving and collaborative learning approach, to continuously improve a central model that exhibits robustness when tackling different target groups and languages. By keeping the training local to the users’ devices, we ensure the privacy of the users’ data while benefitting from the efficiency of federated learning. Furthermore, we personalize client models to target-specific training data and evaluate their performance. Our results indicate the effectiveness of FL across different target groups, whereas the benefits of personalization on few-shot learning are not clear.

MCML Authors

Haotian Ye

Computational Linguistics

Axel Wisiorek

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Antonis Maronikolakis

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1385]

Y. Yeganeh, R. Xiao, G. Guvercin, N. Navab and A. Farshad.
Conformable Convolution for Topologically Aware Learning of Complex Anatomical Structures.
Preprint (Dec. 2024). arXiv

Abstract

While conventional computer vision emphasizes pixel-level and feature-based objectives, medical image analysis of intricate biological structures necessitates explicit representation of their complex topological properties. Despite their successes, deep learning models often struggle to accurately capture the connectivity and continuity of fine, sometimes pixel-thin, yet critical structures due to their reliance on implicit learning from data. Such shortcomings can significantly impact the reliability of analysis results and hinder clinical decision-making. To address this challenge, we introduce Conformable Convolution, a novel convolutional layer designed to explicitly enforce topological consistency. Conformable Convolution learns adaptive kernel offsets that preferentially focus on regions of high topological significance within an image. This prioritization is guided by our proposed Topological Posterior Generator (TPG) module, which leverages persistent homology. The TPG module identifies key topological features and guides the convolutional layers by applying persistent homology to feature maps transformed into cubical complexes. Our proposed modules are architecture-agnostic, enabling them to be integrated seamlessly into various architectures. We showcase the effectiveness of our framework in the segmentation task, where preserving the interconnectedness of structures is critical. Experimental results on three diverse datasets demonstrate that our framework effectively preserves the topology in the segmentation downstream task, both quantitatively and qualitatively.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Rui Xiao

Interpretable and Reliable Machine Learning

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1384]

A. Kathan, S. Amiriparian, L. Christ, S. Eulitz and B. W. Schuller.
Automatic Speech-Based Charisma Recognition and the Impact of Integrating Auxiliary Characteristics.
TELEPRESENCE 2024 - IEEE Conference on Telepresence. Pasadena, CA, USA, Nov 16-17, 2024. DOI

Abstract

Automatic recognition of speaker’s states and traits is crucial to facilitate a more naturalistic human-AI interaction – a key focus in human-computer interaction to enhance user experience. One particularly important trait in daily life is charisma. To date, its definition is still controversial. However, it seems that there are characteristics in speech that the majority perceives as charismatic. To this end, we address the novel speech-based task of charisma recognition in a three-fold approach. First, we predict charismatic speech using both interpretable acoustic features and embeddings of two audio Transformers. Afterwards, we make use of auxiliary labels that are highly correlated with charisma, including enthusiastic, likeable, attractive, warm, and leader-like, to check their impact on charisma recognition. Finally, we personalise the best model, taking individual speech characteristics into account. In our experiments, we demonstrate that the charisma prediction model benefits from integrating auxiliary characteristics as well as from the personalised approach, resulting in a best Pearson’s correlation coefficient of 0.4304.

MCML Authors

Alexander Kathan

Health Informatics

Shahin Amiriparian

Dr.

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1383]

S. Amiriparian, M. Gerczuk, J. Lutz, W. Strube, I. Papazova, A. Hasan, A. Kathan and B. W. Schuller.
Non-Invasive Suicide Risk Prediction Through Speech Analysis.
EHB 2024 - 12th E-Health and Bioengineering Conference. IASI, Romania, Nov 14-15, 2024. DOI

Abstract

The delayed access to specialized psychiatric assessments and care for patients at risk of suicidal tendencies in emergency departments creates a notable gap in timely intervention, hindering the provision of adequate mental health support during critical situations. To address this, we present a non-invasive, speech-based approach for automatic suicide risk assessment. For our study, we collected a novel speech recording dataset from 20 patients. We extract three sets of features, including wav2vec, interpretable speech and acoustic features, and deep learning-based spectral representations. We proceed by conducting a binary classification to assess suicide risk in a leave-one-subject-out fashion. Our most effective speech model achieves a balanced accuracy of 66.2%. Moreover, we show that integrating our speech model with a series of patients’ metadata, such as the history of suicide attempts or access to firearms, improves the overall result. The metadata integration yields a balanced accuracy of 94.4%, marking an absolute improvement of 28.2%, demonstrating the efficacy of our proposed approaches for automatic suicide risk assessment in emergency medicine.

MCML Authors

Shahin Amiriparian

Dr.

Health Informatics

Maurice Gerczuk

Health Informatics

Alexander Kathan

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1382]

M. Di Marco and A. Fraser.
Subword Segmentation in LLMs: Looking at Inflection and Consistency.
EMNLP 2024 - Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

The role of subword segmentation in relation to capturing morphological patterns in LLMs is currently not well explored. Ideally, one would train models like GPT using various segmentations and evaluate how well word meanings are captured. Since this is not computationally feasible, we group words according to their segmentation properties and compare how well a model can solve a linguistic task for these groups. We study two criteria: (i) adherence to morpheme boundaries and (ii) the segmentation consistency of the different inflected forms of a lemma. We select word forms with high and low values for these criteria and carry out experiments on GPT-4o’s ability to capture verbal inflection for 10 languages. Our results indicate that in particular the criterion of segmentation consistency can help to predict the model’s ability to recognize and generate the lemma from an inflected form, providing evidence that subword segmentation is relevant.

MCML Authors

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

[1381]

L. Edman, H. Schmid and A. Fraser.
CUTE: Measuring LLMs’ Understanding of Their Tokens.
EMNLP 2024 - Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Large Language Models (LLMs) show remarkable performance on a wide variety of tasks. Most LLMs split text into multi-character tokens and process them as atomic units without direct access to individual characters. This raises the question: To what extent can LLMs learn orthographic information? To answer this, we propose a new benchmark, CUTE, which features a collection of tasks designed to test the orthographic knowledge of LLMs. We evaluate popular LLMs on CUTE, finding that most of them seem to know the spelling of their tokens, yet fail to use this information effectively to manipulate text, calling into question how much of this knowledge is generalizable.

MCML Authors

Lukas Edman

Dr.

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

[1380]

W. Lai, V. Hangya and A. Fraser.
Style-Specific Neurons for Steering LLMs in Text Style Transfer.
EMNLP 2024 - Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Text style transfer (TST) aims to modify the style of a text without altering its original meaning. Large language models (LLMs) demonstrate superior performance across multiple tasks, including TST. However, in zero-shot setups, they tend to directly copy a significant portion of the input text to the output without effectively changing its style. To enhance the stylistic variety and fluency of the text, we present sNeuron-TST, a novel approach for steering LLMs using style-specific neurons in TST. Specifically, we identify neurons associated with the source and target styles and deactivate source-style-only neurons to give target-style words a higher probability, aiming to enhance the stylistic diversity of the generated text. However, we find that this deactivation negatively impacts the fluency of the generated text, which we address by proposing an improved contrastive decoding method that accounts for rapid token probability shifts across layers caused by deactivated source-style neurons. Empirical experiments demonstrate the effectiveness of the proposed method on six benchmarks, encompassing formality, toxicity, politics, politeness, authorship, and sentiment.

MCML Authors

Wen Lai

Data Analytics & Statistics

Viktor Hangya

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

* Former Member

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Data Analytics & Statistics

[1379]

Y. J. Liu, T. Aoyama, W. Scivetti, Y. Zhu, S. Behzad, L. E. Levine, J. Lin, D. Tiwari and A. Zeldes.
GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains.
EMNLP 2024 - Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Work on shallow discourse parsing in English has focused on the Wall Street Journal corpus, the only large-scale dataset for the language in the PDTB framework. However, the data is not openly available, is restricted to the news domain, and is by now 35 years old. In this paper, we present and evaluate a new open-access, multi-genre benchmark for PDTB-style shallow discourse parsing, based on the existing UD English GUM corpus, for which discourse relation annotations in other frameworks already exist. In a series of experiments on cross-domain relation classification, we show that while our dataset is compatible with PDTB, substantial out-of-domain degradation is observed, which can be alleviated by joint training on both datasets.

MCML Authors

Yang Janet Liu

AI and Computational Linguistics

[1378]

Y. Liu, Y. Zhang, Q. Li, T. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy.
EMNLP 2024 - Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Full-parameter fine-tuning has become the go-to choice for adapting language models (LMs) to downstream tasks due to its excellent performance. As LMs grow in size, fine-tuning the full parameters of LMs requires a prohibitively large amount of GPU memory. Existing approaches utilize zeroth-order optimizer to conserve GPU memory, which can potentially compromise the performance of LMs as non-zero order optimizers tend to converge more readily on most downstream tasks. In this paper, we propose a novel optimizer-independent end-to-end hierarchical fine-tuning strategy, HiFT, which only updates a subset of parameters at each training step. HiFT can significantly reduce the amount of gradients and optimizer state parameters residing in GPU memory at the same time, thereby reducing GPU memory usage. Our results demonstrate that: (1) HiFT achieves comparable performance to parameter-efficient fine-tuning and standard full parameter fine-tuning. (2) HiFT supports various optimizers including AdamW, AdaGrad, SGD, etc. (3) HiFT can save more than 60% GPU memory compared with standard full-parameter fine-tuning for 7B model. (4) HiFT enables full-parameter fine-tuning of a 7B model on single 48G A6000 with a precision of 32 using the AdamW optimizer, without using any memory saving techniques.

MCML Authors

Yongkang Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Tong Liu

Database Systems and Data Mining

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computational Linguistics

[1377]

P. Mondorf and B. Plank.
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models.
EMNLP 2024 - Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Knights and knaves problems represent a classic genre of logical puzzles where characters either tell the truth or lie. The objective is to logically deduce each character’s identity based on their statements. The challenge arises from the truth-telling or lying behavior, which influences the logical implications of each statement. Solving these puzzles requires not only direct deductions from individual statements, but the ability to assess the truthfulness of statements by reasoning through various hypothetical scenarios. As such, knights and knaves puzzles serve as compelling examples of suppositional reasoning. In this paper, we introduce TruthQuest, a benchmark for suppositional reasoning based on the principles of knights and knaves puzzles. Our benchmark presents problems of varying complexity, considering both the number of characters and the types of logical statements involved. Evaluations on TruthQuest show that large language models like Llama 3 and Mixtral-8x7B exhibit significant difficulties solving these tasks. A detailed error analysis of the models’ output reveals that lower-performing models exhibit a diverse range of reasoning errors, frequently failing to grasp the concept of truth and lies. In comparison, more proficient models primarily struggle with accurately inferring the logical implications of potentially false statements.

MCML Authors

Philipp Mondorf

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[1376]

P. F. Balestrucci, S. Casola, S. M. Lo, V. Basile and A. Mazzei.
I’m sure you’re a real scholar yourself: Exploring Ironic Content Generation by Large Language Models.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Generating ironic content is challenging: it requires a nuanced understanding of context and implicit references and balancing seriousness and playfulness. Moreover, irony is highly subjective and can depend on various factors, such as social, cultural, or generational aspects. This paper explores whether Large Language Models (LLMs) can learn to generate ironic responses to social media posts. To do so, we fine-tune two models to generate ironic and non-ironic content and deeply analyze their outputs’ linguistic characteristics, their connection to the original post, and their similarity to the human-written replies. We also conduct a large-scale human evaluation of the outputs. Additionally, we investigate whether LLMs can learn a form of irony tied to a generational perspective, with mixed results.

MCML Authors

Silvia Casola

Dr.

AI and Computational Linguistics

[1375]

B. Chen, X. Wang, S. Peng, R. Litschko, A. Korhonen and B. Plank.
'Seeing the Big through the Small': Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Human label variation (HLV) is a valuable source of information that arises when multiple human annotators provide different labels for valid reasons. In Natural Language Inference (NLI) earlier approaches to capturing HLV involve either collecting annotations from many crowd workers to represent human judgment distribution (HJD) or use expert linguists to provide detailed explanations for their chosen labels. While the former method provides denser HJD information, obtaining it is resource-intensive. In contrast, the latter offers richer textual information but it is challenging to scale up to many human judges. Besides, large language models (LLMs) are increasingly used as evaluators (‘LLM judges’) but with mixed results, and few works aim to study HJDs. This study proposes to exploit LLMs to approximate HJDs using a small number of expert labels and explanations. Our experiments show that a few explanations significantly improve LLMs’ ability to approximate HJDs with and without explicit labels, thereby providing a solution to scale up annotations for HJD. However, fine-tuning smaller soft-label aware models with the LLM-generated model judgment distributions (MJDs) presents partially inconsistent results: while similar in distance, their resulting fine-tuned models and visualized distributions differ substantially. We show the importance of complementing instance-level distance measures with a global-level shape metric and visualization to more effectively evaluate MJDs against human judgment distributions.

MCML Authors

Beiduo Chen

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Xinpeng Wang

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Siyao Peng

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Robert Litschko

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1374]

Z. Ding, J. Wu, J. Wu, Y. Xia and V. Tresp.
Temporal Fact Reasoning over Hyper-Relational Knowledge Graphs.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Stemming from traditional knowledge graphs (KGs), hyper-relational KGs (HKGs) provide additional key-value pairs (i.e., qualifiers) for each KG fact that help to better restrict the fact validity. In recent years, there has been an increasing interest in studying graph reasoning over HKGs. Meanwhile, as discussed in recent works that focus on temporal KGs (TKGs), world knowledge is ever-evolving, making it important to reason over temporal facts in KGs. Previous mainstream benchmark HKGs do not explicitly specify temporal information for each HKG fact. Therefore, almost all existing HKG reasoning approaches do not devise any module specifically for temporal reasoning. To better study temporal fact reasoning over HKGs, we propose a new type of data structure named hyper-relational TKG (HTKG). Every fact in an HTKG is coupled with a timestamp explicitly indicating its time validity. We develop two new benchmark HTKG datasets, i.e., Wiki-hy and YAGO-hy, and propose an HTKG reasoning model that efficiently models hyper-relational temporal facts. To support future research on this topic, we open-source our datasets and model.

MCML Authors

Zifeng Ding

Database Systems and Data Mining

Yan Xia

Dr.

* Former Member

Volker Tresp

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[1373]

E. Garces Arias, J. Rodemann, M. Li, C. Heumann and M. Aßenmacher.
Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Decoding from the output distributions of large language models to produce high-quality text is a complex challenge in language modeling. Various approaches, such as beam search, sampling with temperature, k−sampling, nucleus p−sampling, typical decoding, contrastive decoding, and contrastive search, have been proposed to address this problem, aiming to improve coherence, diversity, as well as resemblance to human-generated text. In this study, we introduce adaptive contrastive search, a novel decoding strategy extending contrastive search by incorporating an adaptive degeneration penalty, guided by the estimated uncertainty of the model at each generation step. This strategy is designed to enhance both the creativity and diversity of the language modeling process while at the same time producing coherent and high-quality generated text output. Our findings indicate performance enhancement in both aspects, across different model architectures and datasets, underscoring the effectiveness of our method in text generation tasks. Our code base, datasets, and models are publicly available.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1372]

A. Köksal, T. Schick, A. Korhonen and H. Schütze.
LongForm: Effective Instruction Tuning with Reverse Instructions.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI GitHub

Abstract

Instruction tuning enables language models to more effectively generalize and better follow user intent. However, obtaining instruction data is costly and challenging. Prior work employs methods such as expensive human annotation, crowd-sourced datasets with alignment issues, and generating noisy examples via LLMs. We introduce the LongForm-C dataset, which is created by reverse instructions. We generate instructions via LLMs for human-written corpus examples using reverse instructions. First we select a diverse set of human-written documents from corpora such as C4 and Wikipedia; then we generate instructions for these documents via LLMs. This approach provides a cheaper and cleaner instruction-tuning dataset with natural output and one suitable for long text generation. Our models outperform 10x larger language models without instruction tuning on tasks such as story/recipe generation and long-form question answering. Moreover, LongForm models outperform prior instruction-tuned models such as FLAN-T5 and Alpaca by a large margin, and improve language understanding capabilities further.

MCML Authors

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1371]

R. Liao, M. Erler, H. Wang, G. Zhai, G. Zhang, Y. Ma and V. Tresp.
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI GitHub

Abstract

In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis. We propose a framework VideoINSTA, i.e. INformative Spatial-TemporAl Reasoning for zero-shot long-form video understanding. VideoINSTA contributes (1) a zero-shot framework for long video understanding using LLMs; (2) an event-based temporal reasoning and content-based spatial reasoning approach for LLMs to reason over spatial-temporal information in videos; (3) a self-reflective information reasoning scheme balancing temporal factors based on information sufficiency and prediction confidence. Our model significantly improves the state-of-the-art on three long video question-answering benchmarks: EgoSchema, NextQA, and IntentQA, and the open question answering dataset ActivityNetQA.

MCML Authors

Ruotong Liao

Database Systems and Data Mining

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Gengyuan Zhang

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Database Systems and Data Mining

[1370]

B. Ma, X. Wang, T. Hu, A.-C. Haensch, M. A. Hedderich, B. Plank and F. Kreuter.
The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may capture and convey. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOVs). However, measuring AOVs embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of clarity on how different studies are related to each other and how they can be interpreted. This paper aims to bridge this gap by providing a comprehensive overview of recent works on the evaluation of AOVs in LLMs. Moreover, we survey related approaches in different stages of the evaluation pipeline in these works. By doing so, we address the potential and challenges with respect to understanding the model, human-AI alignment, and downstream application in social sciences. Finally, we provide practical insights into evaluation methods, model enhancement, and interdisciplinary collaboration, thereby contributing to the evolving landscape of evaluating AOVs in LLMs.

MCML Authors

Bolei Ma

Social Data Science and AI

Xinpeng Wang

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Michael Hedderich

Dr.

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Social Data Science and AI

[1369]

A. Modarressi, A. Köksal and H. Schütze.
Consistent Document-Level Relation Extraction via Counterfactuals.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Many datasets have been developed to train and evaluate document-level relation extraction (RE) models. Most of these are constructed using real-world data. It has been shown that RE models trained on real-world data suffer from factual biases. To evaluate and address this issue, we present CovEReD, a counterfactual data generation approach for document-level relation extraction datasets using entity replacement. We first demonstrate that models trained on factual data exhibit inconsistent behavior: while they accurately extract triples from factual data, they fail to extract the same triples after counterfactual modification. This inconsistency suggests that models trained on factual data rely on spurious signals such as specific entities and external knowledge – rather than on the input context – to extract triples. We show that by generating document-level counterfactual data with CovEReD and training models on them, consistency is maintained with minimal impact on RE performance. We release our CovEReD pipeline as well as Re-DocRED-CF, a dataset of counterfactual RE documents, to assist in evaluating and addressing inconsistency in document-level RE.

MCML Authors

Ali Modarressi

Computational Linguistics

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computational Linguistics

[1368]

A. Sedova, R. Litschko, D. Frassinelli, B. Roth and B. Plank.
To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.

MCML Authors

Robert Litschko

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[1367]

M. Wang, L. Lange, H. Adel, J. Strötgen and H. Schütze.
Better Call SAUL: Fluent and Consistent Language Model Editing with Generation Regularization.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

To ensure large language models contain up-to-date knowledge, they need to be updated regularly. However, model editing is challenging as it might also affect knowledge that is unrelated to the new data. State-of-the-art methods identify parameters associated with specific knowledge and then modify them via direct weight updates. However, these locate-and-edit methods suffer from heavy computational overhead and lack theoretical validation. In contrast, directly fine-tuning the model on requested edits affects the model’s behavior on unrelated knowledge, and significantly damages the model’s generation fluency and consistency. To address these challenges, we propose SAUL, a streamlined model editing method that uses sentence concatenation with augmented random facts for generation regularization. Evaluations on three model editing benchmarks show that SAUL is a practical and reliable solution for model editing outperforming state-of-the-art methods while maintaining generation quality and reducing computational overhead.

MCML Authors

Mingyang Wang

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1366]

O. Xhelili, Y. Liu and H. Schütze.
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI GitHub

Abstract

Multilingual pre-trained models (mPLMs) have shown impressive performance on cross-lingual transfer tasks. However, the transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language, even though the two languages may be related or share parts of their vocabularies. Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method aiming to improve the cross-lingual alignment between languages using diverse scripts. We select two areal language groups, Mediterranean-Amharic-Farsi and South+East Asian Languages, wherein the languages are mutually influenced but use different scripts. We apply our method to these language groups and conduct extensive experiments on a spectrum of downstream tasks. The results show that after PPA, models consistently outperform the original model (up to 50% for some tasks) in English-centric transfer. In addition, when we use languages other than English as sources in transfer, our method obtains even larger improvements.

MCML Authors

Yihong Liu

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1365]

A. Yüksel, A. Köksal, L. K. Senel, A. Korhonen and H. Schütze.
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI GitHub

Abstract

Multiple choice question answering tasks evaluate the reasoning, comprehension, and mathematical abilities of Large Language Models (LLMs). While existing benchmarks employ automatic translation for multilingual evaluation, this approach is error-prone and potentially introduces culturally biased questions, especially in social sciences. We introduce the first multitask, multiple-choice Turkish QA benchmark, TurkishMMLU, to evaluate LLMs’ understanding of the Turkish language. TurkishMMLU includes over 10,000 questions, covering 9 different subjects from Turkish high-school education curricula. These questions are written by curriculum experts, suitable for the high-school curricula in Turkey, covering subjects ranging from natural sciences and math questions to more culturally representative topics such as Turkish Literature and the history of the Turkish Republic. We evaluate over 20 LLMs, including multilingual open-source (e.g., Gemma, Llama, MT5), closed-source (GPT 4o, Claude, Gemini), and Turkish-adapted (e.g., Trendyol) models. We provide an extensive evaluation, including zero-shot and few-shot evaluation of LLMs, chain-of-thought reasoning, and question difficulty analysis along with model performance. We provide an in-depth analysis of the Turkish capabilities and limitations of current LLMs to provide insights for future LLMs for the Turkish language.

MCML Authors

Abdullatif Köksal

Computational Linguistics

Lütfi Kerem Senel

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1364]

H. Zhang, J. Liu, Z. Han, S. Chen, B. He, V. Tresp, Z. Xu and J. Gu.
Visual Question Decomposition on Multimodal Large Language Models.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Question decomposition has emerged as an effective strategy for prompting Large Language Models (LLMs) to answer complex questions. However, while existing methods primarily focus on unimodal language models, the question decomposition capability of Multimodal Large Language Models (MLLMs) has yet to be explored. To this end, this paper explores visual question decomposition on MLLMs. Specifically, we introduce a systematic evaluation framework including a dataset and several evaluation criteria to assess the quality of the decomposed sub-questions, revealing that existing MLLMs struggle to produce high-quality sub-questions. To address this limitation, we propose a specific finetuning dataset, DecoVQA+, for enhancing the model’s question decomposition capability. Aiming at enabling models to perform appropriate selective decomposition, we propose an efficient finetuning pipeline. The finetuning pipeline consists of our proposed dataset and a training objective for selective decomposition. Finetuned MLLMs demonstrate significant improvements in the quality of sub-questions and the policy of selective question decomposition. Additionally, the models also achieve higher accuracy with selective decomposition on VQA benchmark datasets.

MCML Authors

Shuo Chen

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

B2 | Natural Language Processing
→ Group Michael Hedderich

Database Systems and Data Mining

[1363]

R. Zhao, A. Köksal, Y. Liu, L. Weissweiler, A. Korhonen and H. Schütze.
SynthEval: Hybrid Behavioral Testing of NLP Models with Synthetic Evaluation.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI GitHub

Abstract

Traditional benchmarking in NLP typically involves using static held-out test sets. However, this approach often results in an overestimation of performance and lacks the ability to offer comprehensive, interpretable, and dynamic assessments of NLP models. Recently, works like DynaBench (Kiela et al., 2021) and CheckList (Ribeiro et al., 2020) have addressed these limitations through behavioral testing of NLP models with test types generated by a multistep human-annotated pipeline. Unfortunately, manually creating a variety of test types requires much human labor, often at prohibitive cost. In this work, we propose SYNTHEVAL, a hybrid behavioral testing framework that leverages large language models (LLMs) to generate a wide range of test types for a comprehensive evaluation of NLP models. SYNTHEVAL first generates sentences via LLMs using controlled generation, and then identifies challenging examples by comparing the predictions made by LLMs with task-specific NLP models. In the last stage, human experts investigate the challenging examples, manually design templates, and identify the types of failures the taskspecific models consistently exhibit. We apply SYNTHEVAL to two classification tasks, sentiment analysis and toxic language detection, and show that our framework is effective in identifying weaknesses of strong models on these tasks.

MCML Authors

Raoyuan Zhao

AI and Computational Linguistics

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Yihong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Leonie Weissweiler

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Computational Linguistics

[1362]

K. Hämmerl, A. Manea, G. Vico, J. Helcl and J. Libovický.
CUNI and LMU Submission to the MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval.
MRL @EMNLP 2024 - 4th Multilingual Representation Learning Workshop at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

We present the joint CUNI and LMU submission to the MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval. The shared task objective was to explore how we can deploy modern methods in NLP in multi-lingual low-resource settings, tested on two sub-tasks: Named-entity recognition and question answering. Our solutions to the subtasks are based on data acquisition and model adaptation. We compare the performance of our submitted systems with the translate-test approach which proved to be the most useful in the previous edition of the shared task. Our results show that using more data as well as fine-tuning recent multilingual pre-trained models leads to considerable improvements over the translate-test baseline.

MCML Authors

Katharina Hämmerl

Data Analytics & Statistics

[1361]

J. Wang, L. Zuo, S. Peng and B. Plank.
MultiClimate: Multimodal Stance Detection on Climate Change Videos.
NLP4PI @EMNLP 2024 - 3rd Workshop on NLP for Positive Impact at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. DOI GitHub

Abstract

Climate change (CC) has attracted increasing attention in NLP in recent years. However, detecting the stance on CC in multimodal data is understudied and remains challenging due to a lack of reliable datasets. To improve the understanding of public opinions and communication strategies, this paper presents MultiClimate, the first open-source manually-annotated stance detection dataset with 100 CC-related YouTube videos and 4,209 frame-transcript pairs. We deploy state-of-the-art vision and language models, as well as multimodal models for MultiClimate stance detection. Results show that text-only BERT significantly outperforms image-only ResNet50 and ViT. Combining both modalities achieves state-of-the-art, 0.747/0.749 in accuracy/F1. Our 100M-sized fusion models also beat CLIP and BLIP, as well as the much larger 9B-sized multimodal IDEFICS and text-only Llama3 and Gemma2, indicating that multimodal stance detection remains challenging for large language models.

MCML Authors

Siyao Peng

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1360]

A. Mallol-Ragolta, M. Milling and B. W. Schuller.
Multi-Triplet Loss-Based Models for Categorical Depression Recognition from Speech.
IberSPEECH 2024 - 7th Conference IberSPEECH 2024. Aveiro, Portugal, Nov 11-13, 2024. PDF

Abstract

We analyse four different acoustic feature sets towards the automatic recognition of depression from speech signals. Specifically, the feature sets investigated are based on Mel-Frequency Cepstral Coefficients (MFCC), the Low-Level Descriptors (LLD) of the eGeMAPS feature set, Mel-spectrogram coefficients, and pretrained self-supervised Wav2Vec 2.0 representations. The main hypothesis investigated lies in the use of a multi-triplet loss to improve the inter-class separability of the data representations learnt in the embedding space, boosting, ultimately, the overall system performance. To assess this aspect, we implement three different techniques to perform the classification of the embedded representations learnt. These include the combination of two fully connected layers with softmax, a linear support vector classifier, and a clustering-based classifier with k−Means. We conduct our experiments on the Extended Distress Analysis Interview Corpus, released in the Detecting Depression Subchallenge (DDS) of the 9th Audio/Visual Emotion Challenge (AVEC), in 2019. We select the Unweighted Average Recall (UAR) as the evaluation metric. Our best model exploits the eGeMAPS-based feature set, optimises a triplet loss, and utilises a LinearSVC as the classifier. Tackling the task as a 6-class classification problem, this model scores a UAR of 25.7% on the test partition, an increment in 9% of the chance level.

MCML Authors

Adria Mallol-Ragolta

Health Informatics

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1359]

A. Mallol-Ragolta, A. Spiesberger, A. B. Salvador and B. W. Schuller.
Prototypical Networks for Speech Emotion Recognition in Spanish.
IberSPEECH 2024 - 7th Conference IberSPEECH 2024. Aveiro, Portugal, Nov 11-13, 2024. PDF

Abstract

We explore the utilisation of prototypical networks in the Speech Emotion Recognition (SER) problem, creating prototypical representations of the targeted emotions in the embeddings space. We hypothesise this technique can help to improve the performance and robustness of the models, in comparison to standard classification-based approaches. We investigate two approaches to train the prototypes: one optimising a triplet loss, and the other minimising a prototypical loss. To assess our hypothesis, we exploit the EmoMatchSpanishDB Corpus; a novel dataset for SER in Spanish, which includes speech samples conveying the six basic emotions defined by Paul Ekman, in addition to the neutral state. We methodologically split the available samples into three speaker-independent train, development, and test partitions. The proposed splitting is not only balanced in terms of the speakers’ gender, but also homogenised in terms of their recognition difficulty. We analyse the performance of our models with a gender perspective. The models exploit the eGeMAPS and the wav2vec 2.0 feature representations extracted from the speech samples. We choose the Unweighted Average Recall (UAR) as the evaluation metric to assess the models’ performance. The chance level UAR for a seven-class classification problem is 14.3%. The models optimising the prototypical loss obtain the highest UAR scores on the test set, 52.0% and 52.7%, with the eGeMAPS and the wav2vec 2.0 representations, respectively. Nevertheless, the best performances are obtained with a Support Vector Classifier (SVC) implementing a radial basis function kernel, with a UAR of 54.4% and 56.9% when exploiting the eGeMAPS and the wav2vec 2.0 representations, respectively.

MCML Authors

Adria Mallol-Ragolta

Health Informatics

Anika Spiesberger

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1358]

A. Mallol-Ragolta, A. Spiesberger and B. W. Schuller.
Face Mask Type and Coverage Area Recognition from Speech with Prototypical Networks.
IberSPEECH 2024 - 7th Conference IberSPEECH 2024. Aveiro, Portugal, Nov 11-13, 2024. PDF

Abstract

We investigate the use of prototypical networks on the problems of face mask type (3 classes), face mask coverage area (3 classes), and face mask type and coverage area (5 classes) recognition from speech. We explore the MASCFLICHT Corpus, a dataset containing 2 h 27 m 55 s of speech data from 30 German speakers recorded with a smartphone. We extract formant-related features and the spectrogram representations from the samples. We enrich the spectrograms overlaying the traces of the central frequency of the first four formants. Our experiments also consider the fusion via concatenation of the embedded representations extracted from the formant-related features and the spectrogram representations. We implement classification- and prototypical encoder-based networks. The results obtained on the test sets support the suitability of the prototypical encoder models, scoring an Unweighted Average Recall (UAR) of 49.9%, 45.0%, and 31.6% on the three considered problems, respectively.

MCML Authors

Adria Mallol-Ragolta

Health Informatics

Anika Spiesberger

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1357]

A. Bashardoust, S. Feuerriegel and Y. R. Shrestha.
Comparing the Willingness to Share for Human-generated vs. AI-generated Fake News.
CSCW 2024 - 27th ACM SIGCHI Conference on Computer-Supported Cooperative Work and Social Computing. San José, Costa Rica, Nov 09-13, 2024. DOI

Abstract

Generative artificial intelligence (AI) presents large risks for society when it is used to create fake news. A crucial factor for fake news to go viral on social media is that users share such content. Here, we aim to shed light on the sharing behavior of users across human-generated vs. AI-generated fake news. Specifically, we study: (1) What is the perceived veracity of human-generated fake news vs. AI-generated fake news? (2) What is the user’s willingness to share human-generated fake news vs. AI-generated fake news on social media? (3) What socio-economic characteristics let users fall for AI-generated fake news? To this end, we conducted a pre-registered, online experiment with N= 988 subjects and 20 fake news from the COVID-19 pandemic generated by GPT-4 vs. humans. Our findings show that AI-generated fake news is perceived as less accurate than human-generated fake news, but both tend to be shared equally. Further, several socio-economic factors explain who falls for AI-generated fake news.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1356]

D. Geißler and S. Feuerriegel.
Analyzing the Strategy of Propaganda using Inverse Reinforcement Learning: Evidence from the 2022 Russian Invasion of Ukraine.
CSCW 2024 - 27th ACM SIGCHI Conference on Computer-Supported Cooperative Work and Social Computing. San José, Costa Rica, Nov 09-13, 2024. DOI

Abstract

The 2022 Russian invasion of Ukraine was accompanied by a large-scale, pro-Russian propaganda campaign on social media. However, the strategy behind the dissemination of propaganda has remained unclear, particularly how the online discourse was strategically shaped by the propagandists’ community. Here, we analyze the strategy of the Twitter community using an inverse reinforcement learning (IRL) approach. Specifically, IRL allows us to model online behavior as a Markov decision process, where the goal is to infer the underlying reward structure that guides propagandists when interacting with users with a supporting or opposing stance toward the invasion. Thereby, we aim to understand empirically whether and how between-user interactions are strategically used to promote the proliferation of Russian propaganda. For this, we leverage a large-scale dataset with 349,455 posts with pro-Russian propaganda from 132,131 users. We show that bots and humans follow a different strategy: bots respond predominantly to pro-invasion messages, suggesting that they seek to drive virality; while messages indicating opposition primarily elicit responses from humans, suggesting that they tend to engage in critical discussions. To the best of our knowledge, this is the first study analyzing the strategy behind propaganda from the 2022 Russian invasion of Ukraine through the lens of IRL.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[1355]

A. Maarouf, N. Pröllochs and S. Feuerriegel.
The Virality of Hate Speech on Social Media.
CSCW 2024 - 27th ACM SIGCHI Conference on Computer-Supported Cooperative Work and Social Computing. San José, Costa Rica, Nov 09-13, 2024. DOI

Abstract

Online hate speech is responsible for violent attacks such as, e.g., the Pittsburgh synagogue shooting in 2018, thereby posing a significant threat to vulnerable groups and society in general. However, little is known about what makes hate speech on social media go viral. In this paper, we collect N = 25,219 cascades with 65,946 retweets from X (formerly known as Twitter) and classify them as hateful vs. normal. Using a generalized linear regression, we then estimate differences in the spread of hateful vs. normal content based on author and content variables. We thereby identify important determinants that explain differences in the spreading of hateful vs. normal content. For example, hateful content authored by verified users is disproportionally more likely to go viral than hateful content from non-verified ones: hateful content from a verified user (as opposed to normal content) has a 3.5 times larger cascade size, a 3.2 times longer cascade lifetime, and a 1.2 times larger structural virality. Altogether, we offer novel insights into the virality of hate speech on social media.

MCML Authors

Abdurahman Maarouf

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1354]

I. M. Grigore, G. M. Tavares and S. Barbon Junior.
Beyond Flattening: Detecting Concurrency Anomalies Using K-NN Graph-Based Modeling in Object-Centric Event Logs.
DATAMOD @SEFM 2024 - 12th International Symposium From Data to Models and Back at the 22nd International Conference of Software Engineering and Formal Methods (SEFM 2024). Aveiro, Portugal, Nov 04-05, 2024. DOI

Abstract

Detecting anomalous executions is essential in today’s dynamic and diverse business environments. It plays a pivotal role in identifying inefficiencies, ensuring compliance, and mitigating risks associated with deviations from standard procedures. Traditional process mining techniques generally assume a linear sequence of events. However, real-world processes often present concurrency, characterized by the parallel execution of multiple activities or cases and complex interactions among events. These behaviors are not mapped by conventional linear models, this way, not accurately capturing the dynamic nature of process flows. To tackle this challenge, this study proposes a new approach for detecting concurrency anomalies using a K-NN graph-based model, overcoming the traditional flattening method. In our experiments, we explored object-centric event logs with different types of concurrency anomalies and compared them to the traditional flattening procedure. Our proposal was able to provide comprehensive and precise communities (clusters) of anomalous variants compared to the baseline.

MCML Authors

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining

[1353]

C. Kern, R. Bach, H. Mautner and F. Kreuter.
When Small Decisions Have Big Impact: Fairness Implications of Algorithmic Profiling Schemes.
ACM Journal on Responsible Computing (Nov. 2024). DOI

Abstract

Algorithmic profiling is increasingly used in the public sector with the hope of allocating limited public resources more effectively and objectively. One example is the prediction-based profiling of job seekers to guide the allocation of support measures by public employment services. However, empirical evaluations of potential side-effects such as unintended discrimination and fairness concerns are rare in this context. We systematically compare and evaluate statistical models for predicting job seekers’ risk of becoming long-term unemployed concerning subgroup prediction performance, fairness metrics, and vulnerabilities to data analysis decisions. Focusing on Germany as a use case, we evaluate profiling models under realistic conditions using large-scale administrative data. We show that despite achieving high prediction performance on average, profiling models can be considerably less accurate for vulnerable social subgroups. In this setting, different classification policies can have very different fairness implications. We therefore call for rigorous auditing processes before such models are put to practice.

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[1352]

Q. Li, S. Krapf, L. Mou, Y. Shi and X. Zhu.
Deep learning-based framework for city-scale rooftop solar potential estimation by considering roof superstructures.
Applied Energy 374.123839 (Nov. 2024). DOI

Abstract

Solar energy is an environmentally friendly energy source. Identifying suitable rooftops for solar panel installation contributes to not only sustainable energy plans but also carbon neutrality goals. Aerial imagery, bolstered by its growing availability, is a cost-effective data source for rooftop solar potential assessment at large scale. Existing studies generally do not take roof superstructures into account when determining how many solar panels can be installed. This procedure will lead to an overestimation of solar potential. Only several works have considered this issue, but none have devised a network that can simultaneously learn roof orientations and roof superstructures. Therefore, we devise SolarNet+, a novel framework to improve the precision of rooftop solar potential estimation. After implementing SolarNet+ on a benchmark dataset, we find that SolarNet+ outperforms other state-of-the-art approaches in both tasks — roof orientations and roof superstructure segmentation. Moreover, the SolarNet+ framework enables rooftop solar estimation at large-scale applications for investigating the correlation between urban rooftop solar potential and various local climate zone (LCZ) types. The results in the city of Brussels reveal that three specific LCZ urban types exhibit the highest rooftop solar potential efficiency: compact highrise (LCZ1), compact midrise (LCZ2), and heavy industry (LCZ10). The annual photovoltaic potential for these LCZ types is reported as 10.56 , 11.77 , and 10.70 , respectively.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1351]

K. D. Bartl-Pokorny, C. Zitta, M. Beirit, G. Vogrinec, B. W. Schuller and F. B. Pokorny.
Focused review on artificial intelligence for disease detection in infants.
Frontiers in Digital Health 6 (Nov. 2024). DOI

Abstract

Over the last years, studies using artificial intelligence (AI) for the detection and prediction of diseases have increased and also concentrated more and more on vulnerable groups of individuals, such as infants. The release of ChatGPT demonstrated the potential of large language models (LLMs) and heralded a new era of AI with manifold application possibilities. However, the impact of this new technology on medical research cannot be fully estimated yet. In this work, we therefore aimed to summarise the most recent pre-ChatGPT developments in the field of automated detection and prediction of diseases and disease status in infants, i.e., within the first 12 months of life. For this, we systematically searched the scientific databases PubMed and IEEE Xplore for original articles published within the last five years preceding the release of ChatGPT (2018–2022). The search revealed 927 articles; a final number of 154 articles was included for review. First of all, we examined research activity over time. Then, we analysed the articles from 2022 for medical conditions, data types, tasks, AI approaches, and reported model performance. A clear trend of increasing research activity over time could be observed. The most recently published articles focused on medical conditions of twelve different ICD-11 categories; “certain conditions originating in the perinatal period” was the most frequently addressed disease category. AI models were trained with a variety of data types, among which clinical and demographic information and laboratory data were most frequently exploited. The most frequently performed tasks aimed to detect present diseases, followed by the prediction of diseases and disease status at a later point in development. Deep neural networks turned out as the most popular AI approach, even though traditional methods, such as random forests and support vector machines, still play a role—presumably due to their explainability or better suitability when the amount of data is limited. Finally, the reported performances in many of the reviewed articles suggest that AI has the potential to assist in diagnostic procedures for infants in the near future. LLMs will boost developments in this field in the upcoming years.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1350]

Y. Wang, H. Hernández Hernández, C. M. Albrecht and X. Zhu.
Feature Guided Masked Autoencoder for Self-Supervised Learning in Remote Sensing.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18 (Nov. 2024). DOI

Abstract

Self-supervised learning guided by masked image modeling, such as masked autoencoder (MAE), has attracted wide attention for pretraining vision transformers in remote sensing. However, MAE tends to excessively focus on pixel details, limiting the model’s capacity for semantic understanding, particularly for noisy synthetic aperture radar (SAR) images. In this article, we explore spectral and spatial remote sensing image features as improved MAE-reconstruction targets. We first conduct a study on reconstructing various image features, all performing comparably well or better than raw pixels. Based on such observations, we propose feature guided MAE (FG-MAE): reconstructing a combination of histograms of oriented gradients (HOG) and normalized difference indices (NDI) for multispectral images, and reconstructing HOG for SAR images. Experimental results on three downstream tasks illustrate the effectiveness of FG-MAE with a particular boost for SAR imagery (e.g., up to 5% better than MAE on EuroSAT-SAR). Furthermore, we demonstrate the well-inherited scalability of FG-MAE and release a first series of pretrained vision transformers for medium-resolution SAR and multispectral images.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1349]

Y. Yang, X. Sun, J. Dong, K.-M. Lam and X. Zhu.
Attention-ConvNet Network for Ocean-Front Prediction via Remote Sensing SST Images.
IEEE Transactions on Geoscience and Remote Sensing 62 (Nov. 2024). DOI GitHub

Abstract

Ocean front is one typical geophysical phenomenon acting as oases in the ocean for fishes and marine mammals. Accurate ocean-front prediction is critical for fishery and navigation safety. However, the formation and evolution of ocean fronts are inherently nonlinear and are influenced by various factors such as ocean currents, wind fields, and temperature changes, making ocean-front prediction a considerable challenge. This study proposes a temporal-sensitive network named Attention-ConvNet to address this challenge. Ocean fronts exhibit significant multiscale characteristics, requiring analysis and prediction across various temporal and spatial scales. The proposed network designs a hierarchical attention mechanism (HAM) that efficiently prioritizes relevant spatial and temporal information to meet the specific requirement. What is more, the proposed network uses a complex hierarchical branching convolutional network (HBCNet) architecture, which allows our network to leverage the complementary strengths of spatial and temporal information, effectively capturing the dynamic and complex variations in ocean fronts. In general, the network prioritizes and focuses on the most relevant information of front dynamics, which ensures its ability to effectively predict the ocean front. External experiments demonstrate that our network significantly outperforms conventional methods, confirming its capability for precise ocean-front prediction.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1348]

W. Yu, X. Zhang, R. Gloaguen, X. Zhu and P. Ghamisi.
MineNetCD: A Benchmark for Global Mining Change Detection on Remote Sensing Imagery.
IEEE Transactions on Geoscience and Remote Sensing 62 (Nov. 2024). DOI

Abstract

Monitoring land changes triggered by mining activities is crucial for industrial control, environmental management, and regulatory compliance, yet it poses significant challenges due to the vast and often remote locations of mining sites. Remote sensing technologies have increasingly become indispensable to detect and analyze these changes over time. We thus introduce MineNetCD, a comprehensive benchmark designed for global mining change detection using remote sensing imagery. The benchmark comprises three key contributions. First, we establish a global mining change detection dataset featuring more than 70k paired patches of bitemporal high-resolution remote sensing images and pixel-level annotations from 100 mining sites worldwide. Second, we develop a novel baseline model based on a change-aware fast Fourier transform (ChangeFFT) module, which enhances various backbones by leveraging essential spectrum components within features in the frequency domain and capturing the channelwise correlation of bitemporal feature differences to learn change-aware representations. Third, we construct a unified change detection (UCD) framework that currently integrates 20 change detection methods. This framework is designed for streamlined and efficient processing, using the cloud platform hosted by HuggingFace. Extensive experiments have been conducted to demonstrate the superiority of the proposed baseline model compared with 19 state-of-the-art change detection approaches. Empirical studies on modularized backbones comprehensively confirm the efficacy of different representation learners on change detection. This benchmark represents significant advancements in the field of remote sensing and change detection, providing a robust resource for future research and applications in global mining monitoring.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1347]

M. F. Azampour, K. Mach, E. Fatemizadeh, B. Demiray, K. Westenfelder, K. Steiger, M. Eiber, T. Wendler, B. Kainz and N. Navab.
Multitask Weakly Supervised Generative Network for MR-US Registration.
IEEE Transactions on Medical Imaging 43.11 (Nov. 2024). DOI

Abstract

Registering pre-operative modalities, such as magnetic resonance imaging or computed tomography, to ultrasound images is crucial for guiding clinicians during surgeries and biopsies. Recently, deep-learning approaches have been proposed to increase the speed and accuracy of this registration problem. However, all of these approaches need expensive supervision from the ultrasound domain. In this work, we propose a multitask generative framework that needs weak supervision only from the pre-operative imaging domain during training. To perform a deformable registration, the proposed framework translates a magnetic resonance image to the ultrasound domain while preserving the structural content. To demonstrate the efficacy of the proposed method, we tackle the registration problem of pre-operative 3D MR to transrectal ultrasonography images as necessary for targeted prostate biopsies. We use an in-house dataset of 600 patients, divided into 540 for training, 30 for validation, and the remaining for testing. An expert manually segmented the prostate in both modalities for validation and test sets to assess the performance of our framework. The proposed framework achieves a 3.58 mm target registration error on the expert-selected landmarks, 89.2% in the Dice score, and 1.81 mm 95th percentile Hausdorff distance on the prostate masks in the test set. Our experiments demonstrate that the proposed generative model successfully translates magnetic resonance images into the ultrasound domain. The translated image contains the structural content and fine details due to an ultrasound-specific two-path design of the generative model. The proposed framework enables training learning-based registration methods while only weak supervision from the pre-operative domain is available.

MCML Authors

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Computer Aided Medical Procedures & Augmented Reality

[1346]

T. Woehrle, F. Pfeiffer, M. M. Mandl, W. Sobtzick, J. Heitzer, A. Krstova, L. Kamm, M. Feuerecker, D. Moser, M. Klein, B. Aulinger, M. Dolch, A.-L. Boulesteix, D. Lanz and A. Choukér.
Point-of-care breath sample analysis by semiconductor-based E-Nose technology discriminates non-infected subjects from SARS-CoV-2 pneumonia patients: a multi-analyst experiment.
MedComm 5.11 (Nov. 2024). DOI

Abstract

Metal oxide sensor-based electronic nose (E-Nose) technology provides an easy to use method for breath analysis by detection of volatile organic compound (VOC)-induced changes of electrical conductivity. Resulting signal patterns are then analyzed by machine learning (ML) algorithms. This study aimed to establish breath analysis by E-Nose technology as a diagnostic tool for severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) pneumonia within a multi-analyst experiment. Breath samples of 126 subjects with (n = 63) or without SARS-CoV-2 pneumonia (n = 63) were collected using the ReCIVA® Breath Sampler, enriched and stored on Tenax sorption tubes, and analyzed using an E-Nose unit with 10 sensors. ML approaches were applied by three independent data analyst teams and included a wide range of classifiers, hyperparameters, training modes, and subsets of training data. Within the multi-analyst experiment, all teams successfully classified individuals as infected or uninfected with an averaged area under the curve (AUC) larger than 90% and misclassification error lower than 19%, and identified the same sensor as most relevant to classification success. This new method using VOC enrichment and E-Nose analysis combined with ML can yield results similar to polymerase chain reaction (PCR) detection and superior to point-of-care (POC) antigen testing. Reducing the sensor set to the most relevant sensor may prove interesting for developing targeted POC testing.

MCML Authors

Maximilian Mandl

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Biometry in Molecular Medicine

[1345]

C. Geldhauser and C. Kuehn.
Travelling waves for discrete stochastic bistable equations.
Partial Differential Equations and Applications 5.35 (Nov. 2024). DOI

Abstract

Many physical, chemical and biological systems have an inherent discrete spatial structure that strongly influences their dynamical behaviour. Similar remarks apply to internal or external noise. In this paper we study the combined effect of spatial discretization and stochastic perturbations on travelling waves in the Nagumo equation, which is a prototypical model for bistable reaction-diffusion partial differential equations (PDEs). We prove that under suitable parameter conditions, various discrete-stochastic variants of the Nagumo equation have solutions, which stay close on long time scales to the classical monotone Nagumo front with high probability if the noise covariance and spatial discretization are sufficiently small.

MCML Authors

Carina Geldhauser

Dr.

* Former Member

[1344]

S. Nyholm.
Digital Duplicates and Personal Scarcity: Reply to Voinea et al and Lundgren.
Philosophy and Technology 37.132 (Nov. 2024). DOI

Abstract

In our recent paper in this journal, (‘Digital Duplicates and the Scarcity Problem: Might AI Make Us Less Scarce and Therefore Less Valuable?’’, Danaher & Nyholm (2024)), John Danaher and I discussed the possibility of creating digital duplicates of particular people (e.g. by means of creating fine-tuned language models whose outputs sound like those of a particular person). We were specifically interested in how this might be seen as affecting the value of particular people as unique individuals and as scarce resources…

MCML Authors

Sven Nyholm

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Ethics of Artificial Intelligence

[1343]

Y. Li, Y. Zhang, K. Kawaguchi, A. Khakzar, B. Bischl and M. Rezaei.
A Dual-Perspective Approach to Evaluating Feature Attribution Methods.
Transactions on Machine Learning Research (Nov. 2024). URL

Abstract

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model’s behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

MCML Authors

Yawei Li

Statistical Learning and Data Science

Ashkan Khakzar

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[1342]

D. Bär, A. Maarouf and S. Feuerriegel.
Generative AI may backfire for counterspeech.
Preprint (Nov. 2024). arXiv

Abstract

Online hate speech poses a serious threat to individual well-being and societal cohesion. A promising solution to curb online hate speech is counterspeech. Counterspeech is aimed at encouraging users to reconsider hateful posts by direct replies. However, current methods lack scalability due to the need for human intervention or fail to adapt to the specific context of the post. A potential remedy is the use of generative AI, specifically large language models (LLMs), to write tailored counterspeech messages. In this paper, we analyze whether contextualized counterspeech generated by state-of-the-art LLMs is effective in curbing online hate speech. To do so, we conducted a large-scale, pre-registered field experiment (N=2,664) on the social media platform Twitter/X. Our experiment followed a 2x2 between-subjects design and, additionally, a control condition with no counterspeech. On the one hand, users posting hateful content on Twitter/X were randomly assigned to receive either (a) contextualized counterspeech or (b) non-contextualized counterspeech. Here, the former is generated through LLMs, while the latter relies on predefined, generic messages. On the other hand, we tested two counterspeech strategies: (a) promoting empathy and (b) warning about the consequences of online misbehavior. We then measured whether users deleted their initial hateful posts and whether their behavior changed after the counterspeech intervention (e.g., whether users adopted a less toxic language). We find that non-contextualized counterspeech employing a warning-of-consequence strategy significantly reduces online hate speech. However, contextualized counterspeech generated by LLMs proves ineffective and may even backfire.

MCML Authors

Dominik Bär

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Abdurahman Maarouf

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1341]

F. Bongratz, M. Karmann, A. Holz, M. Bonhoeffer, V. Neumaier, S. Deli, B. Schmitz-Koep, C. Zimmer, C. Sorg, M. Thalhammer, D. M. Hedderich and C. Wachinger.
MLV2-Net: Rater-Based Majority-Label Voting for Consistent Meningeal Lymphatic Vessel Segmentation.
Preprint (Nov. 2024). arXiv

Abstract

Meningeal lymphatic vessels (MLVs) are responsible for the drainage of waste products from the human brain. An impairment in their functionality has been associated with aging as well as brain disorders like multiple sclerosis and Alzheimer’s disease. However, MLVs have only recently been described for the first time in magnetic resonance imaging (MRI), and their ramified structure renders manual segmentation particularly difficult. Further, as there is no consistent notion of their appearance, human-annotated MLV structures contain a high inter-rater variability that most automatic segmentation methods cannot take into account. In this work, we propose a new rater-aware training scheme for the popular nnU-Net model, and we explore rater-based ensembling strategies for accurate and consistent segmentation of MLVs. This enables us to boost nnU-Net’s performance while obtaining explicit predictions in different annotation styles and a rater-based uncertainty estimation. Our final model, MLV2-Net, achieves a Dice similarity coefficient of 0.806 with respect to the human reference standard. The model further matches the human inter-rater reliability and replicates age-related associations with MLV volume.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1340]

K. Flöge, M. A. Moeed and V. Fortuin.
Stein Variational Newton Neural Network Ensembles.
Preprint (Nov. 2024). arXiv

Abstract

Deep neural network ensembles are powerful tools for uncertainty quantification, which have recently been re-interpreted from a Bayesian perspective. However, current methods inadequately leverage second-order information of the loss landscape, despite the recent availability of efficient Hessian approximations. We propose a novel approximate Bayesian inference method that modifies deep ensembles to incorporate Stein Variational Newton updates. Our approach uniquely integrates scalable modern Hessian approximations, achieving faster convergence and more accurate posterior distribution approximations. We validate the effectiveness of our method on diverse regression and classification tasks, demonstrating superior performance with a significantly reduced number of training epochs compared to existing ensemble-based methods, while enhancing uncertainty quantification and robustness against overfitting.

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

[1339]

K. Flöge, S. Udayakumar, J. Sommer, M. Piraud, S. Kesselheim, V. Fortuin, S. Günneman, K. J. van der Weg, H. Gohlke, E. Merdivan and A. Bazarova.
OneProt: Towards Multi-Modal Protein Foundation Models.
Preprint (Nov. 2024). arXiv

Abstract

Recent AI advances have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal AI for proteins that integrates structural, sequence, alignment, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of modality encoders along protein sequences. It demonstrates strong performance in retrieval tasks and surpasses state-of-the-art methods in various downstream tasks, including metal ion binding classification, gene-ontology annotation, and enzyme function prediction. This work expands multi-modal capabilities in protein models, paving the way for applications in drug discovery, biocatalytic reaction planning, and protein engineering.

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

[1338]

J. Gauss and T. Nagler.
Asymptotics for estimating a diverging number of parameters -- with and without sparsity.
Preprint (Nov. 2024). arXiv

Abstract

We consider high-dimensional estimation problems where the number of parameters diverges with the sample size. General conditions are established for consistency, uniqueness, and asymptotic normality in both unpenalized and penalized estimation settings. The conditions are weak and accommodate a broad class of estimation problems, including ones with non-convex and group structured penalties. The wide applicability of the results is illustrated through diverse examples, including generalized linear models, multi-sample inference, and stepwise estimation procedures.

MCML Authors

Thomas Nagler

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Statistics & Data Science

[1337]

V. Hofmann, L. Weissweiler, D. Mortensen, H. Schütze and J. Pierrehumbert.
Derivational Morphology Reveals Analogical Generalization in Large Language Models.
Preprint (Nov. 2024). arXiv

Abstract

What mechanisms underlie linguistic generalization in large language models (LLMs)? This question has attracted considerable attention, with most studies analyzing the extent to which the language skills of LLMs resemble rules. As of yet, it is not known whether linguistic generalization in LLMs could equally well be explained as the result of analogical processes, which can be formalized as similarity operations on stored exemplars. A key shortcoming of prior research is its focus on linguistic phenomena with a high degree of regularity, for which rule-based and analogical approaches make the same predictions. Here, we instead examine derivational morphology, specifically English adjective nominalization, which displays notable variability. We introduce a new method for investigating linguistic generalization in LLMs: focusing on GPT-J, we fit cognitive models that instantiate rule-based and analogical learning to the LLM training data and compare their predictions on a set of nonce adjectives with those of the LLM, allowing us to draw direct conclusions regarding underlying mechanisms. As expected, rule-based and analogical models explain the predictions of GPT-J equally well for adjectives with regular nominalization patterns. However, for adjectives with variable nominalization patterns, the analogical model provides a much better match. Furthermore, GPT-J’s behavior is sensitive to the individual word frequencies, even for regular forms, a behavior that is consistent with an analogical account of regular forms but not a rule-based one. These findings refute the hypothesis that GPT-J’s linguistic generalization on adjective nominalization involves rules, suggesting similarity operations on stored exemplars as the underlying mechanism. Overall, our study suggests that analogical processes play a bigger role in the linguistic generalization of LLMs than previously thought.

MCML Authors

Valentin Hofmann

Dr.

* Former Member

Leonie Weissweiler

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computational Linguistics

[1336]

P. Janetzky, T. Schlagenhauf and S. Feuerriegel.
Slowing Down Forgetting in Continual Learning.
Preprint (Nov. 2024). arXiv

Abstract

A common challenge in continual learning (CL) is catastrophic forgetting, where the performance on old tasks drops after new, additional tasks are learned. In this paper, we propose a novel framework called ReCL to slow down forgetting in CL. Our framework exploits an implicit bias of gradient-based neural networks due to which these converge to margin maximization points. Such convergence points allow us to reconstruct old data from previous tasks, which we then combine with the current training data. Our framework is flexible and can be applied on top of existing, state-of-the-art CL methods to slow down forgetting. We further demonstrate the performance gain from our framework across a large series of experiments, including different CL scenarios (class incremental, domain incremental, task incremental learning) different datasets (MNIST, CIFAR10), and different network architectures. Across all experiments, we find large performance gains through ReCL. To the best of our knowledge, our framework is the first to address catastrophic forgetting by leveraging models in CL as their own memory buffers.

MCML Authors

Pascal Janetzky

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Artificial Intelligence in Management

[1335]

K. Jin, J. Latz, C. Liu and A. Scagliotti.
Losing momentum in continuous-time stochastic optimisation.
Preprint (Nov. 2024). arXiv

Abstract

The training of modern machine learning models often consists in solving high-dimensional non-convex optimisation problems that are subject to large-scale data. In this context, momentum-based stochastic optimisation algorithms have become particularly widespread. The stochasticity arises from data subsampling which reduces computational cost. Both, momentum and stochasticity help the algorithm to converge globally. In this work, we propose and analyse a continuous-time model for stochastic gradient descent with momentum. This model is a piecewise-deterministic Markov process that represents the optimiser by an underdamped dynamical system and the data subsampling through a stochastic switching. We investigate longtime limits, the subsampling-to-no-subsampling limit, and the momentum-to-no-momentum limit. We are particularly interested in the case of reducing the momentum over time. Under convexity assumptions, we show convergence of our dynamical system to the global minimiser when reducing momentum over time and letting the subsampling rate go to infinity. We then propose a stable, symplectic discretisation scheme to construct an algorithm from our continuous-time dynamical system. In experiments, we study our scheme in convex and non-convex test problems. Additionally, we train a convolutional neural network in an image classification problem. Our algorithm {attains} competitive results compared to stochastic gradient descent with momentum.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[1334]

K. R. S. Klaus R. Scherer, F. Burkhardt, U. D. Reichel, F. Eyben and B. W. Schuller.
Using voice analysis as an early indicator of risk for depression in young adults.
Preprint (Nov. 2024). arXiv

Abstract

Increasingly frequent publications in the literature report voice quality differences between depressed patients and controls. Here, we examine the possibility of using voice analysis as an early warning signal for the development of emotion disturbances in young adults. As part of a major interdisciplinary European research project in four countries (ECoWeB), examining the effects of web-based prevention programs to reduce the risk for depression in young adults, we analyzed a large number of acoustic voice characteristics in vocal reports of emotions experienced by the participants on a specific day. We were able to identify a number of significant differences in acoustic cues, particularly with respect to the energy distribution in the voice spectrum, encouraging further research efforts to develop promising non-obtrusive risk indicators in the normal speaking voice. This is particularly important in the case of young adults who are less likely to exhibit standard risk factors for depression such as negative life experiences.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1333]

B. Kulynych, J. F. Gomez, G. Kaissis, F. du Pin Calmon and C. Troncoso.
Attack-Aware Noise Calibration for Differential Privacy.
Preprint (Nov. 2024). arXiv URL

Abstract

Differential privacy (DP) is a widely used approach for mitigating privacy risks when training machine learning models on sensitive data. DP mechanisms add noise during training to limit the risk of information leakage. The scale of the added noise is critical, as it determines the trade-off between privacy and utility. The standard practice is to select the noise scale to satisfy a given privacy budget ε. This privacy budget is in turn interpreted in terms of operational attack risks, such as accuracy, sensitivity, and specificity of inference attacks aimed to recover information about the training data records. We show that first calibrating the noise scale to a privacy budget ε, and then translating {epsilon} to attack risk leads to overly conservative risk assessments and unnecessarily low utility. Instead, we propose methods to directly calibrate the noise scale to a desired attack risk level, bypassing the step of choosing ε. For a given notion of attack risk, our approach significantly decreases noise scale, leading to increased utility at the same level of privacy. We empirically demonstrate that calibrating noise to attack sensitivity/specificity, rather than ε, when training privacy-preserving ML models substantially improves model accuracy for the same risk level. Our work provides a principled and practical way to improve the utility of privacy-preserving ML without compromising on privacy.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

[1332]

Y.-J. Li, M. Gladkova, Y. Xia and D. Cremers.
SADG: Segment Any Dynamic Gaussian Without Object Trackers.
Preprint (Nov. 2024). arXiv

Abstract

Understanding dynamic 3D scenes is fundamental for various applications, including extended reality (XR) and autonomous driving. Effectively integrating semantic information into 3D reconstruction enables holistic representation that opens opportunities for immersive and interactive applications. We introduce SADG, Segment Any Dynamic Gaussian Without Object Trackers, a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs. In contrast to existing works, we do not rely on supervision based on object identities to enable consistent segmentation of dynamic 3D objects. To this end, we propose to learn semantically-aware features by leveraging masks generated from the Segment Anything Model (SAM) and utilizing our novel contrastive learning objective based on hard pixel mining. The learned Gaussian features can be effectively clustered without further post-processing. This enables fast computation for further object-level editing, such as object removal, composition, and style transfer by manipulating the Gaussians in the scene. We further extend several dynamic novel-view datasets with segmentation benchmarks to enable testing of learned feature fields from unseen viewpoints. We evaluate SADG on proposed benchmarks and demonstrate the superior performance of our approach in segmenting objects within dynamic scenes along with its effectiveness for further downstream editing tasks.

MCML Authors

Mariia Gladkova

Computer Vision & Artificial Intelligence

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1331]

S. Rampp, M. Milling, A. Triantafyllopoulos and B. W. Schuller.
Does the Definition of Difficulty Matter? Scoring Functions and their Role for Curriculum Learning.
Preprint (Nov. 2024). arXiv

Abstract

Curriculum learning (CL) describes a machine learning training strategy in which samples are gradually introduced into the training process based on their difficulty. Despite a partially contradictory body of evidence in the literature, CL finds popularity in deep learning research due to its promise of leveraging human-inspired curricula to achieve higher model performance. Yet, the subjectivity and biases that follow any necessary definition of difficulty, especially for those found in orderings derived from models or training statistics, have rarely been investigated. To shed more light on the underlying unanswered questions, we conduct an extensive study on the robustness and similarity of the most common scoring functions for sample difficulty estimation, as well as their potential benefits in CL, using the popular benchmark dataset CIFAR-10 and the acoustic scene classification task from the DCASE2020 challenge as representatives of computer vision and computer audition, respectively. We report a strong dependence of scoring functions on the training setting, including randomness, which can partly be mitigated through ensemble scoring. While we do not find a general advantage of CL over uniform sampling, we observe that the ordering in which data is presented for CL-based training plays an important role in model performance. Furthermore, we find that the robustness of scoring functions across random seeds positively correlates with CL performance. Finally, we uncover that models trained with different CL strategies complement each other by boosting predictive power through late fusion, likely due to differences in the learnt concepts. Alongside our findings, we release the aucurriculum toolkit (this https URL), implementing sample difficulty and CL-based training in a modular fashion.

MCML Authors

Manuel Milling

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1330]

S. Rampp, A. Triantafyllopoulos, M. Milling and B. W. Schuller.
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks.
Preprint (Nov. 2024). arXiv

Abstract

This work introduces the key operating principles for autrainer, our new deep learning training framework for computer audition tasks. autrainer is a PyTorch-based toolkit that allows for rapid, reproducible, and easily extensible training on a variety of different computer audition tasks. Concretely, autrainer offers low-code training and supports a wide range of neural networks as well as preprocessing routines. In this work, we present an overview of its inner workings and key capabilities.

MCML Authors

Andreas Triantafyllopoulos

Health Informatics

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1329]

Q. Sun, A. Akman and B. W. Schuller.
Explainable Artificial Intelligence for Medical Applications: A Review.
Preprint (Nov. 2024). arXiv

Abstract

The continuous development of artificial intelligence (AI) theory has propelled this field to unprecedented heights, owing to the relentless efforts of scholars and researchers. In the medical realm, AI takes a pivotal role, leveraging robust machine learning (ML) algorithms. AI technology in medical imaging aids physicians in X-ray, computed tomography (CT) scans, and magnetic resonance imaging (MRI) diagnoses, conducts pattern recognition and disease prediction based on acoustic data, delivers prognoses on disease types and developmental trends for patients, and employs intelligent health management wearable devices with human-computer interaction technology to name but a few. While these well-established applications have significantly assisted in medical field diagnoses, clinical decision-making, and management, collaboration between the medical and AI sectors faces an urgent challenge: How to substantiate the reliability of decision-making? The underlying issue stems from the conflict between the demand for accountability and result transparency in medical scenarios and the black-box model traits of AI. This article reviews recent research grounded in explainable artificial intelligence (XAI), with an emphasis on medical practices within the visual, audio, and multimodal perspectives. We endeavour to categorise and synthesise these practices, aiming to provide support and guidance for future researchers and healthcare professionals.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1328]

M. Szép, D. Rückert, R. Eisenhart-Rothe and F. Hinterwimmer.
A Practical Guide to Fine-tuning Language Models with Limited Data.
Preprint (Nov. 2024). arXiv

Abstract

Employing pre-trained Large Language Models (LLMs) has become the de facto standard in Natural Language Processing (NLP) despite their extensive data requirements. Motivated by the recent surge in research focused on training LLMs with limited data, particularly in low-resource domains and languages, this paper surveys recent transfer learning approaches to optimize model performance in downstream tasks where data is scarce. We first address initial and continued pre-training strategies to better leverage prior knowledge in unseen domains and languages. We then examine how to maximize the utility of limited data during fine-tuning and few-shot learning. The final section takes a task-specific perspective, reviewing models and methods suited for different levels of data scarcity. Our goal is to provide practitioners with practical guidelines for overcoming the challenges posed by constrained data while also highlighting promising directions for future research.

MCML Authors

Daniel Rückert

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence in Healthcare and Medicine

[1327]

M. Thaler, A. Köksal, A. Leidinger, A. Korhonen and H. Schütze.
How far can bias go? -- Tracing bias from pretraining data to alignment.
Preprint (Nov. 2024). arXiv

Abstract

As LLMs are increasingly integrated into user-facing applications, addressing biases that perpetuate societal inequalities is crucial. While much work has gone into measuring or mitigating biases in these models, fewer studies have investigated their origins. Therefore, this study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs, focusing on the Dolma dataset and the OLMo model. Using zero-shot prompting and token co-occurrence analyses, we explore how biases in training data influence model outputs. Our findings reveal that biases present in pre-training data are amplified in model outputs. The study also examines the effects of prompt types, hyperparameters, and instruction-tuning on bias expression, finding instruction-tuning partially alleviating representational bias while still maintaining overall stereotypical gender associations, whereas hyperparameters and prompting variation have a lesser effect on bias expression. Our research traces bias throughout the LLM development pipeline and underscores the importance of mitigating bias at the pretraining stage.

MCML Authors

Abdullatif Köksal

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1326]

A. Triantafyllopoulos, Y. Terhorst, I. Tsangko, F. B. Pokorny, K. D. Bartl-Pokorny, L. Seizer, A. Klein, J. Chim, D. Atzil-Slonim, M. Liakata, M. Bühner, J. Löchner and B. W. Schuller.
Large language models for mental health.
Preprint (Nov. 2024). arXiv

Abstract

Digital technologies have long been explored as a complement to standard procedure in mental health research and practice, ranging from the management of electronic health records to app-based interventions. The recent emergence of large language models (LLMs), both proprietary and open-source ones, represents a major new opportunity on that front. Yet there is still a divide between the community developing LLMs and the one which may benefit from them, thus hindering the beneficial translation of the technology into clinical use. This divide largely stems from the lack of a common language and understanding regarding the technology’s inner workings, capabilities, and risks. Our narrative review attempts to bridge this gap by providing intuitive explanations behind the basic concepts related to contemporary LLMs.

MCML Authors

Andreas Triantafyllopoulos

Health Informatics

Iosif Tsangko

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1325]

W. Zhang, Q. Cheng, D. Skuddis, N. Zeller, D. Cremers and N. Haala.
HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction.
Preprint (Nov. 2024). arXiv GitHub

Abstract

We present HI-SLAM2, a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input. Existing Neural SLAM or 3DGS-based SLAM methods often trade off between rendering quality and geometry accuracy, our research demonstrates that both can be achieved simultaneously with RGB input alone. The key idea of our approach is to enhance the ability for geometry estimation by combining easy-to-obtain monocular priors with learning-based dense SLAM, and then using 3D Gaussian splatting as our core map representation to efficiently model the scene. Upon loop closure, our method ensures on-the-fly global consistency through efficient pose graph bundle adjustment and instant map updates by explicitly deforming the 3D Gaussian units based on anchored keyframe updates. Furthermore, we introduce a grid-based scale alignment strategy to maintain improved scale consistency in prior depths for finer depth details. Through extensive experiments on Replica, ScanNet, and ScanNet++, we demonstrate significant improvements over existing Neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality.

MCML Authors

Qing Cheng

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1324]

B. Lange.
The Future Audit Society? Automated Assurance and Auditing.
AISoLA 2024 - 2nd International Conference on Bridging the Gap Between AI and Reality. Crete, Greece, Oct 30-Nov 03, 2024. To be published.

Abstract

AI audits are a key mechanism for responsible AI governance. AI audits have been proposed in a variety of laws and regulations standardized frameworks and guidelines for industry best practices as a mechanism to facilitate public trust and accountability for AI system developers and deployers. Though AI auditing for the purpose of compliance and assurance with normative requirements currently lacks defined norms and standardized practices, some systematic assurance AI audit methodologies are emerging that are modelled on financial auditing practices. In the spirit of financial audits which aim to uphold trust in the integrity of the proper function of the financial markets for stakeholders, AI audits, on this line of reasoning, aim to provide assurance to their stakeholders about AI organizations’ ability to govern their algorithms in ways that mitigate harms and uphold human values. Against this backdrop, the nature of the auditing industry is currently evolving. Traditional financial auditing practices are becoming increasingly automated by AI and, given the complexity of some AI-systems themselves and the high degree of assurance that they will require, the future of AI auditing itself will foreseeably be automated. This paper makes a first step toward exploring this picture. I argue that current automated auditing trends run the risk of undermining the justificatory plausibility of auditing as an accountability and trust-facilitating mechanism itself. In particular, I suggest that this leads to a continuous desire for verification, in which the epistemic obscurity of auditing assurance – the nature of the judgment provided auditors – increases and the operational capability of audits to achieve their aims decreases.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[1323]

S. Amiriparian, L. Christ, A. Kathan, M. Gerczuk, N. Müller, S. Klug, L. Stappen, A. König, E. Cambria, B. W. Schuller and S. Eulitz.
The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition.
MuSe @MM 2024 - 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor at the 32nd ACM International Conference on Multimedia (MM 2024). Melbourne, Australia , Oct 28-Nov 01, 2024. DOI

Abstract

The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals such as assertiveness, dominance, likability, and sincerity based on the provided audio-visual data. The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset, focusing on the detection of spontaneous humor in a cross-lingual and cross-cultural setting. The main objective of MuSe 2024 is to unite a broad audience from various research domains, including multimodal sentiment analysis, audio-visual affective computing, continuous signal processing, and natural language processing. By fostering collaboration and exchange among experts in these fields, the MuSe 2024 endeavors to advance the understanding and application of sentiment analysis and affective computing across multiple modalities. This baseline paper provides details on each sub-challenge and its corresponding dataset, extracted features from each data modality, and discusses challenge baselines. For our baseline system, we make use of a range of Transformers and expert-designed features and train Gated Recurrent Unit (GRU)-Recurrent Neural Network (RNN) models on them, resulting in a competitive baseline system. On the unseen test datasets of the respective sub-challenges, it achieves a mean Pearson’s Correlation Coefficient (ρ) of 0.3573 for MuSe-Perception and an Area Under the Curve (AUC) value of 0.8682 for MuSe-Humor.

MCML Authors

Shahin Amiriparian

Dr.

Health Informatics

Alexander Kathan

Health Informatics

Maurice Gerczuk

Health Informatics

Björn Schuller

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Health Informatics

[1322]

M. Bernhard, T. Hannan, N. Strauß and M. Schubert.
Context Matters: Leveraging Spatiotemporal Metadata for Semi-Supervised Learning on Remote Sensing Images.
ECAI 2024 - 27th European Conference on Artificial Intelligence. Santiago de Compostela, Spain, Oct 19-24, 2024. DOI GitHub

Abstract

Remote sensing projects typically generate large amounts of imagery that can be used to train powerful deep neural networks. However, the amount of labeled images is often small, as remote sensing applications generally require expert labelers. Thus, semi-supervised learning (SSL), i.e., learning with a small pool of labeled and a larger pool of unlabeled data, is particularly useful in this domain. Current SSL approaches generate pseudo-labels from model predictions for unlabeled samples. As the quality of these pseudo-labels is crucial for performance, utilizing additional information to improve pseudo-label quality yields a promising direction. For remote sensing images, geolocation and recording time are generally available and provide a valuable source of information as semantic concepts, such as land cover, are highly dependent on spatiotemporal context, e.g., due to seasonal effects and vegetation zones. In this paper, we propose to exploit spatiotemporal metainformation in SSL to improve the quality of pseudo-labels and, therefore, the final model performance. We show that directly adding the available metadata to the input of the predictor at test time degenerates the prediction quality for metadata outside the spatiotemporal distribution of the training set. Thus, we propose a teacher-student SSL framework where only the teacher network uses metainformation to improve the quality of pseudo-labels on the training set. Correspondingly, our student network benefits from the improved pseudo-labels but does not receive metadata as input, making it invariant to spatiotemporal shifts at test time. Furthermore, we propose methods for encoding and injecting spatiotemporal information into the model and introduce a novel distillation mechanism to enhance the knowledge transfer between teacher and student. Our framework dubbed Spatiotemporal SSL can be easily combined with several state-of-the-art SSL methods, resulting in significant and consistent improvements on the BigEarthNet and EuroSAT benchmarks.

MCML Authors

Maximilian Bernhard

* Former Member

Tanveer Hannan

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Spatial Artificial Intelligence

[1321]

Y. Liu, F. Shi, D. Wang, Y. Zhang and H. Schütze.
ChatZero: Zero-Shot Cross-Lingual Dialogue Generation via Pseudo-Target Language.
ECAI 2024 - 27th European Conference on Artificial Intelligence. Santiago de Compostela, Spain, Oct 19-24, 2024. DOI

Abstract

Although large language models(LLMs) show amazing capabilities, among various exciting applications discovered for LLMs fall short in other low-resource languages. Besides, most existing methods depend on large-scale dialogue corpora and thus building systems for dialogue generation in a zero-shot scenario remains a considerable challenge. To address this challenge, we propose a novel end-to-end zero-shot dialogue generation model ChatZero based on cross-lingual code-switching method. First, we construct code-switching language and pseudo-target language with placeholders. Then for cross-lingual semantic transfer, we employ unsupervised contrastive learning to minimize the semantics gap of the source language, code-switching language, and pseudo-target language that are mutually positive examples in the high dimensional semantic space. Experiments on the multilingual DailyDialog and DSTC7-AVSD datasets demonstrate that ChatZero can achieve more than 90% of the original performance under the zero-shot case compared to supervised learning, and achieve state-of-the-art performance compared with other baselines.

MCML Authors

Yongkang Liu

* Former Member

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[1320]

J. Nam, I. Chalkidis and M. Rezaei.
Hyperbolic Contrastive Learning for Document Representations – A Multi-View Approach with Paragraph-level Similarities.
ECAI 2024 - 27th European Conference on Artificial Intelligence. Santiago de Compostela, Spain, Oct 19-24, 2024. DOI

Abstract

Self-supervised learning (SSL) has gained prominence due to the increasing availability of unlabeled data and advances in computational efficiency, leading to revolutionized natural language processing with pre-trained language models like BERT and GPT. Representation learning, a core concept in SSL, aims to reduce data dimensionality while preserving meaningful aspects. Conventional SSL methods typically embed data in Euclidean space. However, recent research has revealed that alternative geometries can hold even richer representations, unlocking more meaningful insights from the data. Motivated by this, we propose two novel methods for integrating Hilbert geometry into self-supervised learning for efficient document embedding. First, we present a method directly incorporating Hilbert geometry into the standard Euclidean contrastive learning framework. Additionally, we propose a multi-view hyperbolic contrastive learning framework contrasting both documents and paragraphs. Our findings demonstrate that contrasting only paragraphs, rather than entire documents, can lead to superior efficiency and effectiveness.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

[1319]

M. Aßenmacher, L. Karrlein, P. Schiele and C. Heumann.
Introducing wwm-german-18k - Can LLMs Crack the Million? (Or Win at Least 500 Euros?).
ICNLSP 2024 - 7th International Conference on Natural Language and Speech Processing. Trento, Italy, Oct 19-20, 2024. URL

Abstract

Language-specific evaluation of large language models (LLMs) for multiple-choice question answering (MCQA) is an important means to test their abilities for a multitude of different dimensions. With a data set assembled from questions from the German variant of ‘Who Wants to Be a Millionaire?’ we evaluate a set of German models and ChatGPT concerning factual/commonsense knowledge, syntactic abilities, and logical reasoning, amongst others. We contribute this new MCQA data set, extracted from the show’s episodes and designed to evaluate the ability of models to answer this diverse range of questions. To ensure data quality, we describe our preprocessing, encompassing data cleaning, deduplication, and the creation of stratified splits. Furthermore, we fine-tune a set of German LLMs and prompt ChatGPT to provide baseline results. Our findings reveal that these models achieve (partly) satisfactory performance on questions of lower difficulty levels (≤ 1000 euros). As the difficulty increases, performance steadily declines, highlighting the challenging nature of the later stages of the game. We contribute to the ongoing efforts to advance the capabilities of LLMs in comprehending and answering questions by providing a valuable resource for German MCQA research as well as further insights into the limitations of current LLMs.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1318]

S. M. A. R. Thies, J. C. Alfaro and V. Bengs.
MORE–PLR: Multi-Output Regression Employed for Partial Label Ranking.
DS 2024 - 27th International Conference on Discovery Science. Pisa, Italy, Oct 14-16, 2024. DOI GitHub

Abstract

The partial label ranking (PLR) problem is a supervised learning scenario where the learner predicts a ranking with ties of the labels for a given input instance. It generalizes the well-known label ranking (LR) problem, which only allows for strict rankings. So far, pre-vious learning approaches for PLR have primarily adapted LR methods to accommodate ties in predictions. This paper proposes using multi-output regression (MOR) to address the PLR problem by treating ranking positions as multivariate targets, an approach that has received little attention in both LR and PLR. To effectively employ this approach, we introduce several post-hoc layers that convert MOR results into a ranking, potentially including ties. This framework produces a range of learning approaches, which we demonstrate in experimental evaluations to be competitive with the current state-of-the-art PLR methods.

MCML Authors

Viktor Bengs

Dr.

* Former Member

[1317]

S. Rauch, C. M. M. Frey, L. Zellner and T. Seidl.
Process-Aware Bayesian Networks for Sequential Event Log Queries.
ICPM 2024 - 6th International Conference on Process Mining. Lyngby, Denmark, Oct 14-18, 2024. DOI

Abstract

Business processes from many domains like manufacturing, healthcare, or business administration suffer from different amounts of uncertainty concerning the execution of individual activities and their order of occurrence. As long as a process is not entirely serial, i.e., there are no forks or decisions to be made along the process execution, we are - in the absence of exhaustive domain knowledge - confronted with the question whether and in what order activities should be executed or left out for a given case and a desired outcome. As the occurrence or non-occurrence of events has substantial implications regarding process key performance indicators like throughput times or scrap rate, there is ample need for assessing and modeling that process-inherent uncertainty. We propose a novel way of handling the uncertainty by leveraging the probabilistic mechanisms of Bayesian Networks to model processes from the structural and temporal information given in event log data and offer a comprehensive evaluation of uncertainty by modelling cases in their entirety. In a thorough analysis of well-established benchmark datasets, we show that our Process-aware Bayesian Network is capable of answering process queries concerned with any unknown process sequence regarding activities and/or attributes enhancing the explainability of processes. Our method can infer execution probabilities of activities at different stages and can query probabilities of certain process outcomes. The key benefit of the Process-aware Query System over existing approaches is the ability to deliver probabilistic, case-diagnostic information about the execution of activities via Bayesian inference.

MCML Authors

Simon Rauch

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Christian Frey

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[1316]

A. Maldonado, S. A. Aryasomayajula, C. M. M. Frey and T. Seidl.
iGEDI: interactive Generating Event Data with Intentional Features.
ICPM 2024 - Demo Tracks at the 6th International Conference on Process Mining. Lyngby, Denmark, Oct 14-18, 2024. URL

Abstract

Process mining solutions aim to improve performance, save resources, and address bottlenecks in organizations. However, success depends on data quality and availability, and existing analyses often lack diverse data for rigorous testing. To overcome this, we propose an interactive web application tool, extending the GEDI Python framework, which creates event datasets that meet specific (meta-)features. It provides diverse benchmark event data by exploring new regions within the feature space, enhancing the range and quality of process mining analyses. This tool improves evaluation quality and helps uncover correlations between meta-features and metrics, ultimately enhancing solution effectiveness.

MCML Authors

Andrea Maldonado

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Christian Frey

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[1315]

A. Maldonado.
Data-Driven Approaches Towards Transparent Benchmarking of Process Mining Tasks.
ICPM 2024 - Doctoral Consortium at the 6th International Conference on Process Mining. Lyngby, Denmark, Oct 14-18, 2024. URL

Abstract

The abundance of new approaches in process mining and the diversity of processes in the real-world, raises the question of this thesis: How can we create benchmarks, which reliably measure the impact of event data features on process mining evaluation? Developing benchmarks, that employ comprehensive intentional ED and also consider connections between ED characteristic features, methods, and metrics, will support process miners to evaluate methods more efficiently and reliably.

MCML Authors

Andrea Maldonado

Database Systems and Data Mining

[1314]

L. Cheng, J. Hu, H. Yan, M. Gladkova, T. Huang, Y.-H. Liu, D. Cremers and H. Li.
Physically-Based Photometric Bundle Adjustment in Non-Lambertian Environments.
IROS 2024 - IEEE/RSJ International Conference on Intelligent Robots and Systems. Abu Dhabi, United Arab Emirates, Oct 14-18, 2024. DOI

Abstract

Photometric bundle adjustment (PBA) is widely used in estimating the camera pose and 3D geometry by assuming a Lambertian world. However, the assumption of photometric consistency is often violated since the non-diffuse reflection is common in real-world environments. The photometric inconsistency significantly affects the reliability of existing PBA methods. To solve this problem, we propose a novel physically-based PBA method. Specifically, we introduce the physically-based weights regarding material, illumination, and light path. These weights distinguish the pixel pairs with different levels of photometric inconsistency. We also design corresponding models for material estimation based on sequential images and illumination estimation based on point clouds. In addition, we establish the first SLAM-related dataset of non-Lambertian scenes with complete ground truth of illumination and material. Extensive experiments demonstrated that our PBA method outperforms existing approaches in accuracy.

MCML Authors

Mariia Gladkova

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Haoang Li

Dr.

* Former Member

[1313]

A. Ranne, L. Kuang, Y. Velikova, N. Navab and F. Baena.
CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers.
IROS 2024 - IEEE/RSJ International Conference on Intelligent Robots and Systems. Abu Dhabi, United Arab Emirates, Oct 14-18, 2024. DOI

Abstract

In minimally invasive endovascular procedures, contrast-enhanced angiography remains the most robust imaging technique. However, it is at the expense of the patient and clinician’s health due to prolonged radiation exposure. As an alternative, interventional ultrasound has notable benefits such as being radiation-free, fast to deploy, and having a small footprint in the operating room. Yet, ultrasound is hard to interpret, and highly prone to artifacts and noise. Additionally, interventional radiologists must undergo extensive training before they become qualified to diagnose and treat patients effectively, leading to a shortage of staff, and a lack of open-source datasets. In this work, we seek to address both problems by introducing a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images, without demanding any labeled data. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism, and is capable of learning feature changes across time and space. To facilitate training, we used synthetic ultrasound data based on physics-driven catheter insertion simulations, and translated the data into a unique CT-Ultrasound common domain, CACTUSS, to improve the segmentation performance. We generated ground truth segmentation masks by computing the optical flow between adjacent frames using FlowNet2, and performed thresholding to obtain a binary map estimate. Finally, we validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms, thus demonstrating its potential for applications to clinical data in the future.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1312]

Z. Xian, L. Zellner, G. M. Tavares and T. Seidl.
CC-HIT: Creating Counterfactuals from High-Impact Transitions.
ML4PM @ICPM 2024 - 4th International Workshop on Leveraging Machine Learning in Process Mining at the 6th International Conference on Process Mining (ICPM 2024). Lyngby, Denmark, Oct 14-18, 2024. DOI

Abstract

Reliable process information, especially regarding trace durations, is crucial for smooth execution. Without it, maintaining a process becomes costly. While many predictive systems aim to identify inefficiencies, they often focus on individual process instances, missing the global perspective. It is essential not only to detect where delays occur but also to pinpoint specific activity transitions causing them. To address this, we propose CC-HIT (Creating Counterfactuals from High-Impact Transitions), which identifies temporal dependencies across the entire process. By focusing on activity transitions, we provide deeper insights into relational impacts, enabling faster resolution of inefficiencies. CC-HIT highlights the most influential transitions on process performance, offering actionable insights for optimization. We validate this method using the BPIC 2020 dataset, demonstrating its effectiveness compared to existing approaches.

MCML Authors

Zhicong Xian

Database Systems and Data Mining

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[1311]

M. S. Deka, L. Sang and D. Cremers.
Erasing the Ephemeral: Joint Camera Refinement and Transient Object Removal for Street View Synthesis.
DAGM-GCPR 2024 - German Conference on Pattern Recognition. Munich, Germany, Oct 10-13, 2024. DOI

Abstract

Creating novel views in urban settings is crucial for applications like autonomous driving and virtual tours. Unlike object-level or indoor situations, outdoor settings pose unique challenges, including larger scenes, frame inconsistencies from moving vehicles, and noisy camera poses. This paper introduces a method to address these challenges in view synthesis for outdoor scenarios, utilizing the neural point light field scene representation with 2D image data and 3D point cloud information. We propose a method that efficiently removes dynamic objects in the scene and jointly refines camera poses to recover clean views. We achieve this by estimating the optical flow for the input video sequence and masking out moving objects during training. By learning a consistent geometric representation in the neural point light field, the masked-out areas are correctly recovered in both trained and unseen views, without leaving black areas. Moreover, the learned geometry allows us to extrapolate from current camera trajectory and recover plausible extended views. Additionally, we propose to simultaneously optimize the camera pose along with the scene representation, accommodating noisy camera pose inputs typical of real-world applications. Through validation on real-world urban datasets, we demonstrate stable and satisfactory results in synthesizing novel views of urban scenes.

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1310]

D. Komorowicz, L. Sang, F. Maiwald and D. Cremers.
Coloring the Past: Neural Historical Monuments Reconstruction from Archival Photography.
DAGM-GCPR 2024 - German Conference on Pattern Recognition. Munich, Germany, Oct 10-13, 2024. DOI GitHub

Abstract

Historical monuments are a treasure and milestone of cultural heritage. Reconstructing the 3D models of these buildings holds significant value. The rapid development of neural rendering methods makes it possible to recover the original 3D shape exclusively based on archival photographs. However, this task presents considerable challenges due to the properties of available color images. Historical pictures are often limited in number and the scenes in these photos might have altered over time. The radiometric quality of these images is often sub-optimal for using automatic methods. To address these challenges, we introduce an approach to reconstruct the geometry of historical buildings from limited input images. We leverage dense point clouds as a geometric prior and introduce a color appearance embedding loss in volumetric rendering to recover the color of the building. We aim for our work to spark increased interest and focus on preserving historic buildings. Together with the proposed method, we introduce a new historical dataset of the Hungarian National Theater, providing a new benchmark for 3D reconstruction.

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1309]

J. Meier, L. Scalerandi, O. Dhaouadi, J. Kaiser, N. Araslanov and D. Cremers.
CARLA Drone: Monocular 3D Object Detection from a Different Perspective.
DAGM-GCPR 2024 - German Conference on Pattern Recognition. Munich, Germany, Oct 10-13, 2024. DOI

Abstract

Existing techniques for monocular 3D detection have a serious restriction. They tend to perform well only on a limited set of benchmarks, faring well either on ego-centric car views or on traffic camera views, but rarely on both. To encourage progress, this work advocates for an extended evaluation of 3D detection frameworks across different camera perspectives. We make two key contributions. First, we introduce the CARLA Drone dataset, CDrone. Simulating drone views, it substantially expands the diversity of camera perspectives in existing benchmarks. Despite its synthetic nature, CDrone represents a real-world challenge. To show this, we confirm that previous techniques struggle to perform well both on CDrone and a real-world 3D drone dataset. Second, we develop an effective data augmentation pipeline called GroundMix. Its distinguishing element is the use of the ground for creating 3D-consistent augmentation of a training image. GroundMix significantly boosts the detection accuracy of a lightweight one-stage detector. In our expanded evaluation, we achieve the average precision on par with or substantially higher than the previous state of the art across all tested datasets.

MCML Authors

Johannes Meier

Computer Vision & Artificial Intelligence

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1308]

L. Sang, A. Saroha, M. Gao and D. Cremers.
Enhancing Surface Neural Implicits with Curvature-Guided Sampling and Uncertainty-Augmented Representations.
DAGM-GCPR 2024 - German Conference on Pattern Recognition. Munich, Germany, Oct 10-13, 2024. DOI

Abstract

Neural implicit representations have become a popular choice for modeling surfaces due to their adaptability in resolution and support for complex topology. While previous works have achieved impressive reconstruction quality by training on ground truth point clouds or meshes, they often do not discuss the data acquisition and ignore the effect of input quality and sampling methods during reconstruction. In this paper, we introduce a method that directly digests depth images for the task of high-fidelity 3D reconstruction. To this end, a novel local geometry feature computation method is proposed such that a simple sampling strategy can be adopted to generate highly effective training data. Due to its simplicity, our sampling strategy can be easily incorporated into diverse popular methods, allowing their training process to be more stable and efficient. Despite its simplicity, our method outperforms a range of both classical and learning-based baselines and demonstrates state-of-the-art results in both synthetic and real-world datasets.

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Maolin Gao

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1307]

A. Saroha, M. Gladkova, C. Curreli, D. Muhle, T. Yenamandra and D. Cremers.
Gaussian Splatting in Style.
DAGM-GCPR 2024 - German Conference on Pattern Recognition. Munich, Germany, Oct 10-13, 2024. DOI

Abstract

3D scene stylization extends the work of neural style transfer to 3D. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across multiple views. A vast majority of the previous works achieve this by training a 3D model for every stylized image and a set of multi-view images. In contrast, we propose a novel architecture trained on a collection of style images that, at test time, produces real time high-quality stylized novel views. We choose the underlying 3D scene representation for our model as 3D Gaussian splatting. We take the 3D Gaussians and process them using a multi-resolution hash grid and a tiny MLP to obtain stylized views. The MLP is conditioned on different style codes for generalization to different styles during test time. The explicit nature of 3D Gaussians gives us inherent advantages over NeRF-based methods, including geometric consistency and a fast training and rendering regime. This enables our method to be useful for various practical use cases, such as augmented or virtual reality. We demonstrate that our method achieves state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data.

MCML Authors

Mariia Gladkova

Computer Vision & Artificial Intelligence

Cecilia Curreli

Computer Vision & Artificial Intelligence

Dominik Muhle

Computer Vision & Artificial Intelligence

Tarun Yenamandra

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computer Vision & Artificial Intelligence

[1306]

P. Mondorf and B. Plank.
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models--A Survey.
COLM 2024 - Conference on Language Modeling. Philadelphia, PA, USA, Oct 07-09, 2024. PDF

Abstract

Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning, leading to a lively debate on whether these models possess reasoning capabilities similar to humans. However, despite these successes, the depth of LLMs’ reasoning abilities remains uncertain. This uncertainty partly stems from the predominant focus on task performance, measured through shallow accuracy metrics, rather than a thorough investigation of the models’ reasoning behavior. This paper seeks to address this gap by providing a comprehensive review of studies that go beyond task accuracy, offering deeper insights into the models’ reasoning processes. Furthermore, we survey prevalent methodologies to evaluate the reasoning behavior of LLMs, emphasizing current trends and efforts towards more nuanced reasoning analyses. Our review suggests that LLMs tend to rely on surface-level patterns and correlations in their training data, rather than on sophisticated reasoning abilities. Additionally, we identify the need for further research that delineates the key differences between human and LLM-based reasoning. Through this survey, we aim to shed light on the complex reasoning processes within LLMs.

MCML Authors

Philipp Mondorf

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[1305]

X. Wang, C. Hu, B. Ma, P. Rottger and B. Plank.
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think.
COLM 2024 - Conference on Language Modeling. Philadelphia, PA, USA, Oct 07-09, 2024. PDF

Abstract

Multiple choice questions (MCQs) are commonly used to evaluate the capabilities of large language models (LLMs). One common way to evaluate the model response is to rank the candidate answers based on the log probability of the first token prediction. An alternative way is to examine the text output. Prior work has shown that first token probabilities lack robustness to changes in MCQ phrasing, and that first token probabilities do not match text answers for instruction-tuned models. Therefore, in this paper, we investigate the robustness of text answers. We show that the text answers are more robust to question perturbations than the first token probabilities, when the first token answers mismatch the text answers. The difference in robustness increases as the mismatch rate becomes greater. As the mismatch reaches over 50%, the text answer is more robust to option order changes than the debiased first token probabilities using state-of-the-art debiasing methods such as PriDe. Our findings provide further evidence for the benefits of text answer evaluation over first token probability evaluation.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1304]

F. Dülmer, W. Simson, M. F. Azampour, M. Wysocki, A. Karlas and N. Navab.
PHOCUS: Physics-Based Deconvolution for Ultrasound Resolution Enhancement.
ASMUS @MICCAI 2024 - 5th International Workshop on Advances in Simplifying Medical Ultrasound at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. ASMUS @MICCAI 2024 Best Paper. DOI

Abstract

Ultrasound is widely used in medical diagnostics allowing for accessible and powerful imaging but suffers from resolution limitations due to diffraction and the finite aperture of the imaging system, which restricts diagnostic use. The impulse function of an ultrasound imaging system is called the point spread function (PSF), which is convolved with the spatial distribution of reflectors in the image formation process. Recovering high-resolution reflector distributions by removing image distortions induced by the convolution process improves image clarity and detail. Conventionally, deconvolution techniques attempt to rectify the imaging system’s dependent PSF, working directly on the radio-frequency (RF) data. However, RF data is often not readily accessible. Therefore, we introduce a physics-based deconvolution process using a modeled PSF, working directly on the more commonly available B-mode images. By leveraging Implicit Neural Representations (INRs), we learn a continuous mapping from spatial locations to their respective echogenicity values, effectively compensating for the discretized image space. Our contribution consists of a novel methodology for retrieving a continuous echogenicity map directly from a B-mode image through a differentiable physics-based rendering pipeline for ultrasound resolution enhancement. We qualitatively and quantitatively evaluate our approach on synthetic data, demonstrating improvements over traditional methods in metrics such as PSNR and SSIM. Furthermore, we show qualitative enhancements on an ultrasound phantom and an in-vivo acquisition of a carotid artery.

MCML Authors

Felix Dülmer

Computer Aided Medical Procedures & Augmented Reality

Walter Simson

Dr.

* Former Member

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1303]

F. De Benetti, Y. Yaganeh, C. Belka, S. Corradini, N. Navab, C. Kurz, G. Landry, S. Albarqouni and T. Wendler.
CloverNet – Leveraging Planning Annotations for Enhanced Procedural MR Segmentation: An Application to Adaptive Radiation Therapy.
CLIP @MICCAI 2024 - 13th International Workshop on Clinical Image-Based Procedures at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. CLIP @MICCAI 2024 Best Paper. DOI

Abstract

In radiation therapy (RT), an accurate delineation of the regions of interest (ROI) and organs at risk (OAR) allows for a more targeted irradiation with reduced side effects. The current clinical workflow for combined MR-linear accelerator devices (MR-linacs) requires the acquisition of a planning MR volume (MR-P), in which the ROI and OAR are accurately segmented by the clinical team. These segmentation maps (S-P) are transferred to the MR acquired on the day of the RT fraction (MR-Fx) using registration, followed by time-consuming manual corrections. The goal of this paper is to enable accurate automatic segmentation of MR-Fx using S-P without clinical workflow disruption. We propose a novel UNet-based architecture, CloverNet, that takes as inputs MR-Fx and S-P in two separate encoder branches, whose latent spaces are concatenated in the bottleneck to generate an improved segmentation of MP-Fx. CloverNet improves the absolute Dice Score by 3.73% (relative +4.34%, p<0.001) when compared with conventional 3D UNet. Moreover, we believe this approach is potentially applicable to other longitudinal use cases in which a prior segmentation of the ROI is available.

MCML Authors

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1302]

D. Daum, R. Osuala, A. Riess, G. Kaissis, J. A. Schnabel and M. Di Folco.
On Differentially Private 3D Medical Image Synthesis with Controllable Latent Diffusion Models.
DGM4 @MICCAI 2024 - 4th International Workshop on Deep Generative Models at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Generally, the small size of public medical imaging datasets coupled with stringent privacy concerns, hampers the advancement of data-hungry deep learning models in medical imaging. This study addresses these challenges for 3D cardiac MRI images in the short-axis view. We propose Latent Diffusion Models that generate synthetic images conditioned on medical attributes, while ensuring patient privacy through differentially private model training. To our knowledge, this is the first work to apply and quantify differential privacy in 3D medical image generation. We pre-train our models on public data and finetune them with differential privacy on the UK Biobank dataset. Our experiments reveal that pre-training significantly improves model performance, achieving a Fréchet Inception Distance (FID) of 26.77 at ϵ=10, compared to 92.52 for models without pre-training. Additionally, we explore the trade-off between privacy constraints and image quality, investigating how tighter privacy budgets affect output controllability and may lead to degraded performance. Our results demonstrate that proper consideration during training with differential privacy can substantially improve the quality of synthetic cardiac MRI images, but there are still notable challenges in achieving consistent medical realism.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1301]

A. Riess, A. Ziller, S. Kolek, D. Rückert, J. A. Schnabel and G. Kaissis.
Complex-Valued Federated Learning with Differential Privacy and MRI Applications.
DeCaF @MICCAI 2024 - 5th Workshop on Distributed, Collaborative and Federated Learning at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Federated learning enhanced with Differential Privacy (DP) is a powerful privacy-preserving strategy to protect individuals sharing their sensitive data for processing in fields such as medicine and healthcare. Many medical applications, for example magnetic resonance imaging (MRI), rely on complex-valued signal processing techniques for data acquisition and analysis. However, the appropriate application of DP to complex-valued data is still underexplored. To address this issue, from the theoretical side, we introduce the complex-valued Gaussian mechanism, whose behaviour we characterise in terms of f-DP, -DP and Rényi-DP. Moreover, we generalise the fundamental algorithm DP stochastic gradient descent to complex-valued neural networks and present novel complex-valued neural network primitives compatible with DP. Experimentally, we showcase a proof-of-concept by training federated complex-valued neural networks with DP on a real-world task (MRI pulse sequence classification in k-space), yielding excellent utility and privacy. Our results highlight the relevance of combining federated learning with robust privacy-preserving techniques in the MRI context.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

Georgios Kaissis

Dr.

* Former Member

[1300]

R. Osuala, D. M. Lang, A. Riess, G. Kaissis, Z. Szafranowska, G. Skorupko, O. Diaz, J. A. Schnabel and K. Lekadir.
Enhancing the Utility of Privacy-Preserving Cancer Classification Using Synthetic Data.
Deep-Breath @MICCAI 2024 - 1st Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Deep learning holds immense promise for aiding radiologists in breast cancer detection. However, achieving optimal model performance is hampered by limitations in availability and sharing of data commonly associated to patient privacy concerns. Such concerns are further exacerbated, as traditional deep learning models can inadvertently leak sensitive training information. This work addresses these challenges exploring and quantifying the utility of privacy-preserving deep learning techniques, concretely, (i) differentially private stochastic gradient descent (DP-SGD) and (ii) fully synthetic training data generated by our proposed malignancy-conditioned generative adversarial network. We assess these methods via downstream malignancy classification of mammography masses using a transformer model. Our experimental results depict that synthetic data augmentation can improve privacy-utility tradeoffs in differentially private model training. Further, model pretraining on synthetic data achieves remarkable performance, which can be further increased with DP-SGD fine-tuning across all privacy guarantees. With this first in-depth exploration of privacy-preserving deep learning in breast imaging, we address current and emerging clinical privacy requirements and pave the way towards the adoption of private high-utility deep diagnostic models.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1299]

Y. Yeganeh, R. Lazuardi, A. Shamseddin, E. Dari, Y. Thirani, N. Navab and A. Farshad.
VISAGE: Video Synthesis using Action Graphs for Surgery.
EARTH @MICCAI 2024 - Workshop on Embodied AI and Robotics for HealTHcare at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. EARTH @MICCAI 2024 Best Paper. DOI

Abstract

Surgical data science (SDS) is a field that analyzes patient data before, during, and after surgery to improve surgical outcomes and skills. However, surgical data is scarce, heterogeneous, and complex, which limits the applicability of existing machine learning methods. In this work, we introduce the novel task of future video generation in laparoscopic surgery. This task can augment and enrich the existing surgical data and enable various applications, such as simulation, analysis, and robot-aided surgery. Ultimately, it involves not only understanding the current state of the operation but also accurately predicting the dynamic and often unpredictable nature of surgical procedures. Our proposed method, VISAGE (VIdeo Synthesis using Action Graphs for Surgery), leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures and utilizes diffusion models to synthesize temporally coherent video sequences. VISAGE predicts the future frames given only a single initial frame, and the action graph triplets. By incorporating domain-specific knowledge through the action graph, VISAGE ensures the generated videos adhere to the expected visual and motion patterns observed in real laparoscopic procedures. The results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures, which enables various applications in SDS.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1298]

A. Banaszak, A. H. Berger, L. Lux, S. Shit, D. Rückert and J. C. Paetzold.
Supervised Contrastive Learning for Image-to-Graph Transformers.
GRAIL @MICCAI 2024 - 6th Workshop on GRaphs in biomedicAl Image anaLysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Image-to-graph transformers can effectively encode image information in graphs but are typically difficult to train and require large annotated datasets. Contrastive learning can increase data efficiency by enhancing feature representations, but existing methods are not applicable to graph labels because they operate on categorical label spaces. In this work, we propose a method enabling supervised contrastive learning for image-to-graph transformers. We introduce two supervised contrastive loss formulations based on graph similarity between label pairs that we approximate using a graph neural network. Our approach avoids tailored data augmentation techniques and can be easily integrated into existing training pipelines. We perform multiple empirical studies showcasing performance improvements across various metrics.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1297]

J. Kiechle, D. M. Lang, S. M. Fischer, L. Felsner, J. C. Peeken and J. A. Schnabel.
Graph Neural Networks: A Suitable Alternative to MLPs in Latent 3D Medical Image Classification?
GRAIL @MICCAI 2024 - 6th Workshop on GRaphs in biomedicAl Image anaLysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI URL

Abstract

Recent studies have underscored the capabilities of natural imaging foundation models to serve as powerful feature extractors, even in a zero-shot setting for medical imaging data. Most commonly, a shallow multi-layer perceptron (MLP) is appended to the feature extractor to facilitate end-to-end learning and downstream prediction tasks such as classification, thus representing the de facto standard. However, as graph neural networks (GNNs) have become a practicable choice for various tasks in medical research in the recent past, we direct attention to the question of how effective GNNs are compared to MLP prediction heads for the task of 3D medical image classification, proposing them as a potential alternative. In our experiments, we devise a subject-level graph for each volumetric dataset instance. Therein latent representations of all slices in the volume, encoded through a DINOv2 pretrained vision transformer (ViT), constitute the nodes and their respective node features. We use public datasets to compare the classification heads numerically and evaluate various graph construction and graph convolution methods in our experiments. Our findings show enhancements of the GNN in classification performance and substantial improvements in runtime compared to an MLP prediction head. Additional robustness evaluations further validate the promising performance of the GNN, promoting them as a suitable alternative to traditional MLP classification heads.

MCML Authors

Johannes Kiechle

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1296]

L. Lux, A. H. Berger, M. Romeo-Tricas, M. Menten, D. Rückert and J. C. Paetzold.
Exploring Graphs as Data Representation for Disease Classification in Ophthalmology.
GRAIL @MICCAI 2024 - 6th Workshop on GRaphs in biomedicAl Image anaLysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI URL

Abstract

Interpretability, particularly in terms of human understandable concepts, is essential for building trust in machine learning models for disease classification. However, state-of-the-art image classifiers exhibit limited interpretability, posing a significant barrier to their acceptance in clinical practice. To address this, our work introduces two graph representations of the retinal vasculature, aiming to bridge the gap between high-performance classifiers and human-understandable interpretability concepts in ophthalmology. We use these graphs with the aim of training graph neural networks (GNNs) for disease staging. First, we formally and experimentally show that GNNs can learn known clinical biomarkers. In that, we show that GNNs can learn human interpretable concepts. Next, we train GNNs for disease staging and study how different aggregation strategies lead the GNN to learn more and less human interpretable features. Finally, we propose a visualization for integrated gradients on graphs, which allows us to identify if GNN models have learned human-understandable representations of the data.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1295]

Ç. Köksal, G. Ghazaei, F. Holm, A. Farshad and N. Navab.
SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction.
GRAIL @MICCAI 2024 - 6th Workshop on GRaphs in biomedicAl Image anaLysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. GRAIL @MICCAI 2024 Best Paper. DOI

Abstract

Graph-based holistic scene representations facilitate surgical workflow understanding and have recently demonstrated significant success. However, this task is often hindered by the limited availability of densely annotated surgical scene data. In this work, we introduce an end-to-end framework for the generation and optimization of surgical scene graphs on a downstream task. Our approach leverages the flexibility of graph-based spectral clustering and the generalization capability of foundation models to generate unsupervised scene graphs with learnable properties. We reinforce the initial spatial graph with sparse temporal connections using local matches between consecutive frames to predict temporally consistent clusters across a temporal neighborhood. By jointly optimizing the spatiotemporal relations and node features of the dynamic scene graph with the downstream task of phase segmentation, we address the costly and annotation-burdensome task of semantic scene comprehension and scene graph generation in surgical videos using only weak surgical phase labels. Further, by incorporating effective intermediate scene representation disentanglement steps within the pipeline, our solution outperforms the SOTA on the CATARACTS dataset by 8% accuracy and 10% F1 score in surgical workflow recognition.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1294]

A. H. Berger, L. Lux, N. Stucki, V. Bürgin, S. Shit, A. Banaszaka, D. Rückert, U. Bauer and J. C. Paetzold.
Topologically faithful multi-class segmentation in medical images.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Topological accuracy in medical image segmentation is a highly important property for downstream applications such as network analysis and flow modeling in vessels or cell counting. Recently, significant methodological advancements have brought well-founded concepts from algebraic topology to binary segmentation. However, these approaches have been underexplored in multi-class segmentation scenarios, where topological errors are common. We propose a general loss function for topologically faithful multi-class segmentation extending the recent Betti matching concept, which is based on induced matchings of persistence barcodes. We project the N-class segmentation problem to N single-class segmentation tasks, which allows us to use 1-parameter persistent homology, making training of neural networks computationally feasible. We validate our method on a comprehensive set of four medical datasets with highly variant topological characteristics. Our loss formulation significantly enhances topological correctness in cardiac, cell, artery-vein, and Circle of Willis segmentation.

MCML Authors

Laurin Lux

A2 | Mathematical Foundations
→ Group Ulrich Bauer

Artificial Intelligence in Healthcare and Medicine

Nico Stucki

Applied Topology and Geometry

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry

[1293]

M. Domínguez, Y. Velikova, N. Navab and M. F. Azampour.
Diffusion as Sound Propagation: Physics-Inspired Model for Ultrasound Image Generation.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Deep learning (DL) methods typically require large datasets to effectively learn data distributions. However, in the medical field, data is often limited in quantity, and acquiring labeled data can be costly. To mitigate this data scarcity, data augmentation techniques are commonly employed. Among these techniques, generative models play a pivotal role in expanding datasets. However, when it comes to ultrasound (US) imaging, the authenticity of generated data often diminishes due to the oversight of ultrasound physics.
We propose a novel approach to improve the quality of generated US images by introducing a physics-based diffusion model that is specifically designed for this image modality. The proposed model incorporates an US-specific scheduler scheme that mimics the natural behavior of sound wave propagation in ultrasound imaging. Our analysis demonstrates how the proposed method aids in modeling the attenuation dynamics in US imaging. We present both qualitative and quantitative results based on standard generative model metrics, showing that our proposed method results in overall more plausible images.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

[1292]

S. M. Fischer, L. Felsner, R. Osuala, J. Kiechle, D. M. Lang, J. C. Peeken and J. A. Schnabel.
Progressive Growing of Patch Size: Resource-Efficient Curriculum Learning for Dense Prediction Tasks.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

In this work, we introduce Progressive Growing of Patch Size, a resource-efficient implicit curriculum learning approach for dense prediction tasks. Our curriculum approach is defined by growing the patch size during model training, which gradually increases the task’s difficulty. We integrated our curriculum into the nnU-Net framework and evaluated the methodology on all 10 tasks of the Medical Segmentation Decathlon. With our approach, we are able to substantially reduce runtime, computational costs, and emissions of network training compared to classical constant patch size training. In our experiments, the curriculum approach resulted in improved convergence. We are able to outperform standard nnU-Net training, which is trained with constant patch size, in terms of Dice Score on 7 out of 10 MSD tasks while only spending roughly 50% of the original training runtime. To the best of our knowledge, our Progressive Growing of Patch Size is the first successful employment of a sample-length curriculum in the form of patch size in the field of computer vision.

MCML Authors

Johannes Kiechle

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1291]

Y. Li, I. Yakushev, D. M. Hedderich and C. Wachinger.
PASTA: Pathology-Aware MRI to PET Cross-Modal Translation with Diffusion Models.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Positron emission tomography (PET) is a well-established functional imaging technique for diagnosing brain disorders. However, PET’s high costs and radiation exposure limit its widespread use. In contrast, magnetic resonance imaging (MRI) does not have these limitations. Although it also captures neurodegenerative changes, MRI is a less sensitive diagnostic tool than PET. To close this gap, we aim to generate synthetic PET from MRI. Herewith, we introduce PASTA, a novel pathology-aware image translation framework based on conditional diffusion models. Compared to the state-of-the-art methods, PASTA excels in preserving both structural and pathological details in the target modality, which is achieved through its highly interactive dual-arm architecture and multi-modal condition integration. A cycle exchange consistency and volumetric generation strategy elevate PASTA’s capability to produce high-quality 3D PET scans. Our qualitative and quantitative results confirm that the synthesized PET scans from PASTA not only reach the best quantitative scores but also preserve the pathology correctly. For Alzheimer’s classification, the performance of synthesized scans improves over MRI by 4%, almost reaching the performance of actual PET.

MCML Authors

Yitong Li

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1290]

A. Reithmeir, L. Felsner, R. Braren, J. A. Schnabel and V. A. Zimmer.
Data-Driven Tissue- and Subject-Specific Elastic Regularization for Medical Image Registration.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Physics-inspired regularization is desired for intra-patient image registration since it can effectively capture the biomechanical characteristics of anatomical structures. However, a major challenge lies in the reliance on physical parameters: Parameter estimations vary widely across the literature, and the physical properties themselves are inherently subject-specific. In this work, we introduce a novel data-driven method that leverages hypernetworks to learn the tissue-dependent elasticity parameters of an elastic regularizer. Notably, our approach facilitates the estimation of patient-specific parameters without the need to retrain the network. We evaluate our method on three publicly available 2D and 3D lung CT and cardiac MR datasets. We find that with our proposed subject-specific tissue-dependent regularization, a higher registration quality is achieved across all datasets compared to using a global regularizer.

MCML Authors

Anna Reithmeir

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1289]

T. Su, J. Li, X. Zhang, H. Jin, H. Chen, Q. Wang, F. Lv, B. Zhao and Y. Hu.
Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Multimodal pre-training demonstrates its potential in the medical domain, which learns medical visual representations from paired medical reports. However, many pre-training tasks require extra annotations from clinicians, and most of them fail to explicitly guide the model to learn the desired features of different pathologies. In this paper, we utilize Visual Question Answering (VQA) for multimodal pre-training to guide the framework focusing on targeted pathological features. We leverage descriptions in medical reports to design multi-granular question-answer pairs associated with different diseases, which assist the framework in pre-training without requiring extra annotations from experts. We also propose a novel pre-training framework with a quasi-textual feature transformer, a module designed to transform visual features into a quasi-textual space closer to the textual domain via a contrastive learning strategy. This narrows the vision-language gap and facilitates modality alignment. Our framework is applied to four downstream tasks: report generation, classification, segmentation, and detection across five datasets. Extensive experiments demonstrate the superiority of our framework compared to other state-of-the-art methods.

MCML Authors

Jun Li

Computational Imaging and AI in Medicine

[1288]

O. Tmenova, Y. Velikova, M. Saleh and N. Navab.
Deep Spectral Methods for Unsupervised Ultrasound Image Interpretation.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Ultrasound imaging is challenging to interpret due to non-uniform intensities, low contrast, and inherent artifacts, necessitating extensive training for non-specialists. Advanced representation with clear tissue structure separation could greatly assist clinicians in mapping underlying anatomy and distinguishing between tissue layers. Decomposing an image into semantically meaningful segments is mainly achieved using supervised segmentation algorithms. Unsupervised methods are beneficial, as acquiring large labeled datasets is difficult and costly, but despite their advantages, they still need to be explored in ultrasound. This paper proposes a novel unsupervised deep learning strategy tailored to ultrasound to obtain easily interpretable tissue separations. We integrate key concepts from unsupervised deep spectral methods, which combine spectral graph theory with deep learning methods. We utilize self-supervised transformer features for spectral clustering to generate meaningful segments based on ultrasound-specific metrics and shape and positional priors, ensuring semantic consistency across the dataset. We evaluate our unsupervised deep learning strategy on three ultrasound datasets, showcasing qualitative results across anatomical contexts without label requirements. We also conduct a comparative analysis against other clustering algorithms to demonstrate superior segmentation performance, boundary preservation, and label consistency.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1287]

H. Zerouaoui, G. P. Oderinde, R. Lefdali, K. Echihabi, S. P. Akpulu, N. A. Agbon, A. S. Musa, Y. Yeganeh, A. Farshad and N. Navab.
AMONuSeg: A Histological Dataset for African Multi-organ Nuclei Semantic Segmentation.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Nuclei semantic segmentation is a key component for advancing machine learning and deep learning applications in digital pathology. However, most existing segmentation models are trained and tested on high-quality data acquired with expensive equipment, such as whole slide scanners, which are not accessible to most pathologists in developing countries. These pathologists rely on low-resource data acquired with low-precision microscopes, smartphones, or digital cameras, which have different characteristics and challenges than high-resource data. Therefore, there is a gap between the state-of-the-art segmentation models and the real-world needs of low-resource settings. This work aims to bridge this gap by presenting the first fully annotated African multi-organ dataset for histopathology nuclei semantic segmentation acquired with a low-precision microscope. We also evaluate state-of-the-art segmentation models, including spectral feature extraction encoder and vision transformer-based models, and stain normalization techniques for color normalization of Hematoxylin and Eosin-stained histopathology slides. Our results provide important insights for future research on nuclei histopathology segmentation with low-resource data.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1286]

E. Özsoy, C. Pellegrini, M. Keicher and N. Navab.
ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. Main Conference Best Paper Runner-up. DOI GitHub

Abstract

Every day, countless surgeries are performed worldwide, each within the distinct settings of operating rooms (ORs) that vary not only in their setups but also in the personnel, tools, and equipment used. This inherent diversity poses a substantial challenge for achieving a holistic understanding of the OR, as it requires models to generalize beyond their initial training datasets. To reduce this gap, we introduce ORacle, an advanced vision-language model designed for holistic OR domain modeling, which incorporates multi-view and temporal capabilities and can leverage external knowledge during inference, enabling it to adapt to previously unseen surgical scenarios. This capability is further enhanced by our novel data augmentation framework, which significantly diversifies the training dataset, ensuring ORacle’s proficiency in applying the provided knowledge effectively. In rigorous testing, in scene graph generation, and downstream tasks on the 4D-OR dataset, ORacle not only demonstrates state-of-the-art performance but does so requiring less data than existing models. Furthermore, its adaptability is displayed through its ability to interpret unseen views, actions, and appearances of tools and equipment. This demonstrates ORacle’s potential to significantly enhance the scalability and affordability of OR domain modeling and opens a pathway for future advancements in surgical data science.

MCML Authors

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1285]

J. Li, S. H. Kim, P. Müller, L. Felsner, D. Rückert, B. Wiestler, J. A. Schnabel and C. I. Bercea.
Language Models Meet Anomaly Detection for Better Interpretability and Generalizability.
MMMI @MICCAI 2024 - 5th International Workshop on Multiscale Multimodal Medical Imaging at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

This research explores the integration of language models and unsupervised anomaly detection in medical imaging, addressing two key questions: (1) Can language models enhance the interpretability of anomaly detection maps? and (2) Can anomaly maps improve the generalizability of language models in open-set anomaly detection tasks? To investigate these questions, we introduce a new dataset for multi-image visual question-answering on brain magnetic resonance images encompassing multiple conditions. We propose KQ-Former (Knowledge Querying Transformer), which is designed to optimally align visual and textual information in limited-sample contexts. Our model achieves a 60.81% accuracy on closed questions, covering disease classification and severity across 15 different classes. For open questions, KQ-Former demonstrates a 70% improvement over the baseline with a BLEU-4 score of 0.41, and achieves the highest entailment ratios (up to 71.9%) and lowest contradiction ratios (down to 10.0%) among various natural language inference models. Furthermore, integrating anomaly maps results in an 18% accuracy increase in detecting open-set anomalies, thereby enhancing the language model’s generalizability to previously unseen medical conditions.

MCML Authors

Jun Li

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1284]

S. Lüpke, Y. Yeganeh, E. Adeli, N. Navab and A. Farshad.
Physics-Informed Latent Diffusion for Multimodal Brain MRI Synthesis.
MMMI @MICCAI 2024 - 5th International Workshop on Multiscale Multimodal Medical Imaging at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Recent advances in generative models for medical imaging have shown promise in representing multiple modalities. However, the variability in modality availability across datasets limits the general applicability of the synthetic data they produce. To address this, we present a novel physics-informed generative model capable of synthesizing a variable number of brain MRI modalities, including those not present in the original dataset. Our approach utilizes latent diffusion models and a two-step generative process: first, unobserved physical tissue property maps are synthesized using a latent diffusion model, and then these maps are combined with a physical signal model to generate the final MRI scan. Our experiments demonstrate the efficacy of this approach in generating unseen MR contrasts and preserving physical plausibility. Furthermore, we validate the distributions of generated tissue properties by comparing them to those measured in real brain tissue.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1283]

M. Fischer, P. Neher, T. Wald, S. Dias Almeida, S. Xiao, P. J. Schüffler, R. Braren, M. Götz, A. Muckenhuber, J. Kleesiek, M. Nolden and K. Maier-Hein.
Learned Image Compression for HE-Stained Histopathological Images via Stain Deconvolution.
MOVI @MICCAI 2024 - 2nd International Workshop on Medical Optical Imaging and Virtual Microscopy Image Analysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Processing histopathological Whole Slide Images (WSI) leads to massive storage requirements for clinics worldwide. Even after lossy image compression during image acquisition, additional lossy compression is frequently possible without substantially affecting the performance of deep learning-based (DL) downstream tasks. In this paper, we show that the commonly used JPEG algorithm is not best suited for further compression and we propose Stain Quantized Latent Compression (SQLC), a novel DL based histopathology data compression approach. SQLC compresses staining and RGB channels before passing it through a compression autoencoder (CAE) in order to obtain quantized latent representations for maximizing the compression. We show that our approach yields superior performance in a classification downstream task, compared to traditional approaches like JPEG, while image quality metrics like the Multi-Scale Structural Similarity Index (MS-SSIM) is largely preserved.

MCML Authors

Peter Schüffler

Prof. Dr.

Computational Pathology

[1282]

D. Bani-Harouni, N. Navab and M. Keicher.
MAGDA: Multi-agent Guideline-Driven Diagnostic Assistance.
MedAGI @MICCAI 2024 - 2nd International Workshop on Foundation Models for General Medical AI at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

In emergency departments, rural hospitals, or clinics in less developed regions, clinicians often lack fast image analysis by trained radiologists, which can have a detrimental effect on patients’ healthcare. Large Language Models (LLMs) have the potential to alleviate some pressure from these clinicians by providing insights that can help them in their decision-making. While these LLMs achieve high test results on medical exams showcasing their great theoretical medical knowledge, they tend not to follow medical guidelines. In this work, we introduce a new approach for zero-shot guideline-driven decision support. We model a system of multiple LLM agents augmented with a contrastive vision-language model that collaborate to reach a patient diagnosis. After providing the agents with simple diagnostic guidelines, they will synthesize prompts and screen the image for findings following these guidelines. Finally, they provide understandable chain-of-thought reasoning for their diagnosis, which is then self-refined to consider inter-dependencies between diseases. As our method is zero-shot, it is adaptable to settings with rare diseases, where training data is limited, but expert-crafted disease descriptions are available. We evaluate our method on two chest X-ray datasets, CheXpert and ChestX-ray 14 Longtail, showcasing performance improvement over existing zero-shot methods and generalizability to rare diseases.

MCML Authors

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1281]

D. Grzech, L. Le Folgoc, M. F. Azampour, A. Vlontzos, B. Glocker, N. Navab, J. A. Schnabel and B. Kainz.
Unsupervised Similarity Learning for Image Registration with Energy-Based Models.
WBIR @MICCAI 2024 - 11th International Workshop on Biomedical Image Registration at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

We present a new model for deformable image registration, which learns in an unsupervised way a data-specific similarity metric. The proposed method consists of two neural networks, one that maps pairs of input images to transformations which align them, and one that provides the similarity metric whose maximisation guides the image alignment. We parametrise the similarity metric as an energy-based model, which is simple to train and allows us to improve the accuracy of image registration compared to other models with learnt similarity metrics by taking advantage of a more general mathematical formulation, as well as larger datasets. We also achieve substantial improvement in the accuracy of inter-patient image registration on MRI scans from the OASIS dataset compared to models that rely on traditional functions.

MCML Authors

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1280]

B. Jian, J. Pan, M. Ghahremani, D. Rückert, C. Wachinger and B. Wiestler.
Mamba? Catch The Hype Or Rethink What Really Helps for Image Registration.
WBIR @MICCAI 2024 - 11th International Workshop on Biomedical Image Registration at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

VoxelMorph, proposed in 2018, utilizes Convolutional Neural Networks (CNNs) to address medical image registration problems. In 2021 TransMorph advanced this approach by replacing CNNs with Attention mechanisms, claiming enhanced performance. More recently, the rise of Mamba with selective state space models has led to MambaMorph, which substituted Attention with Mamba blocks, asserting superior registration. These developments prompt a critical question: does chasing the latest computational trends with “more advanced” computational blocks genuinely enhance registration accuracy, or is it merely hype? Furthermore, the role of classic high-level registration-specific designs, such as coarse-to-fine pyramid mechanism, correlation calculation, and iterative optimization, warrants scrutiny, particularly in differentiating their influence from the aforementioned low-level computational blocks. In this study, we critically examine these questions through a rigorous evaluation in brain MRI registration. We employed modularized components for each block and ensured unbiased comparisons across all methods and designs to disentangle their effects on performance. Our findings indicate that adopting “advanced” computational elements fails to significantly improve registration accuracy. Instead, well-established registration-specific designs offer fair improvements, enhancing results by a marginal 1.5% over the baseline. Our findings emphasize the importance of rigorous, unbiased evaluation and contribution disentanglement of all low- and high-level registration components, rather than simply following the computer vision trends with “more advanced” computational blocks. We advocate for simpler yet effective solutions and novel evaluation metrics that go beyond conventional registration accuracy, warranting further research across various organs and modalities.

MCML Authors

Bailiang Jian

Artificial Intelligence in Medical Imaging

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

[1279]

F. Kögl, A. Reithmeir, V. Sideri-Lampretsa, I. Machado, R. Braren, D. Rückert, J. A. Schnabel and V. A. Zimmer.
General Vision Encoder Features as Guidance in Medical Image Registration.
WBIR @MICCAI 2024 - 11th International Workshop on Biomedical Image Registration at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI URL

Abstract

General vision encoders like DINOv2 and SAM have recently transformed computer vision. Even though they are trained on natural images, such encoder models have excelled in medical imaging, e.g., in classification, segmentation, and registration. However, no in-depth comparison of different state-of-the-art general vision encoders for medical registration is available. In this work, we investigate how well general vision encoder features can be used in the dissimilarity metrics for medical image registration. We explore two encoders that were trained on natural images as well as one that was fine-tuned on medical data. We apply the features within the well-established B-spline FFD registration framework. In extensive experiments on cardiac cine MRI data, we find that using features as additional guidance for conventional metrics improves the registration quality.

MCML Authors

Anna Reithmeir

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[1278]

P. O. Schenk and C. Kern.
Connecting algorithmic fairness to quality dimensions in machine learning in official statistics and survey production.
AStA Wirtschafts- und Sozialstatistisches Archiv 18 (Oct. 2024). DOI

Abstract

National Statistical Organizations (NSOs) increasingly draw on Machine Learning (ML) to improve the timeliness and cost-effectiveness of their products. When introducing ML solutions, NSOs must ensure that high standards with respect to robustness, reproducibility, and accuracy are upheld as codified, e.g., in the Quality Framework for Statistical Algorithms (QF4SA; Yung et al. 2022, Statistical Journal of the IAOS). At the same time, a growing body of research focuses on fairness as a pre-condition of a safe deployment of ML to prevent disparate social impacts in practice. However, fairness has not yet been explicitly discussed as a quality aspect in the context of the application of ML at NSOs. We employ the QF4SA quality framework and present a mapping of its quality dimensions to algorithmic fairness. We thereby extend the QF4SA framework in several ways: First, we investigate the interaction of fairness with each of these quality dimensions. Second, we argue for fairness as its own, additional quality dimension, beyond what is contained in the QF4SA so far. Third, we emphasize and explicitly address data, both on its own and its interaction with applied methodology. In parallel with empirical illustrations, we show how our mapping can contribute to methodology in the domains of official statistics, algorithmic fairness, and trustworthy machine learning.

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[1277]

J. Rudolph, J. Rueckel, J. Döpfert, W. X. Ling, J. Opalka, C. Brem, N. Hesse, M. Ingenerf, V. Koliogiannis, O. Solyanik, B. F. Hoppe, H. Zimmermann, W. Flatz, R. Forbrig, M. Patzig, B.-S. Rauchmann, R. Perneczky, O. Peters, J. Priller, A. Schneider, K. Fliessbach, A. Hermann, J. Wiltfang, F. Jessen, E. Düzel, K. Buerger, S. Teipel, C. Laske, M. Synofzik, A. Spottke, M. Ewers, P. Dechent, J.-D. Haynes, J. Levin, T. Liebig, J. Ricke, M. Ingrisch and S. Stoecklein.
Artificial intelligence–based rapid brain volumetry substantially improves differential diagnosis in dementia.
Alzheimer’s and Dementia 16.e70037 (Oct. 2024). DOI

Abstract

This study evaluates the clinical value of a deep learning–based artificial intelligence (AI) system that performs rapid brain volumetry with automatic lobe segmentation and age- and sex-adjusted percentile comparisons.
Methods: Fifty-five patients—17 with Alzheimer’s disease (AD), 18 with frontotemporal dementia (FTD), and 20 healthy controls—underwent cranial magnetic resonance imaging scans. Two board-certified neuroradiologists (BCNR), two board-certified radiologists (BCR), and three radiology residents (RR) assessed the scans twice: first
without AI support and then with AI assistance.
Results: AI significantly improved diagnostic accuracy for AD (area under the curve −AI: 0.800, +AI: 0.926, p < 0.05), with increased correct diagnoses (p < 0.01) and reduced errors (p < 0.03). BCR and RR showed notable performance gains (BCR:
p < 0.04; RR: p < 0.02). For the diagnosis FTD, overall consensus (p < 0.01), BCNR (p < 0.02), and BCR (p < 0.05) recorded significantly more correct diagnoses.
Discussion: AI-assisted volumetry improves diagnostic performance in differentiating AD and FTD, benefiting all reader groups, including BCNR.

MCML Authors

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[1276]

J. Gertheiss, D. Rügamer, B. X. Liew and S. Greven.
Functional Data Analysis: An Introduction and Recent Developments.
Biometrical Journal 66.7 (Oct. 2024). DOI GitHub

Abstract

Functional data analysis (FDA) is a statistical framework that allows for the analysis of curves, images, or functions on higher dimensional domains. The goals of FDA, such as descriptive analyses, classification, and regression, are generally the same as for statistical analyses of scalar-valued or multivariate data, but FDA brings additional challenges due to the high- and infinite dimensionality of observations and parameters, respectively. This paper provides an introduction to FDA, including a description of the most common statistical analysis techniques, their respective software implementations, and some recent developments in the field. The paper covers fundamental concepts such as descriptives and outliers, smoothing, amplitude and phase variation, and functional principal component analysis. It also discusses functional regression, statistical inference with functional data, functional classification and clustering, and machine learning approaches for functional data analysis. The methods discussed in this paper are widely applicable in fields such as medicine, biophysics, neuroscience, and chemistry, and are increasingly relevant due to the widespread use of technologies that allow for the collection of functional data. Sparse functional data methods are also relevant for longitudinal data analysis. All presented methods are demonstrated using available software in R by analyzing a data set on human motion and motor control. To facilitate the understanding of the methods, their implementation, and hands-on application, the code for these practical examples is made available on Github.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1275]

S. Haas, K. Hegestweiler, M. Rapp, M. Muschalik and E. Hüllermeier.
Stakeholder-centric explanations for black-box decisions: an XAI process model and its application to automotive goodwill assessments.
Frontiers in Artificial Intelligence 7 (Oct. 2024). DOI

Abstract

Machine learning has made tremendous progress in predictive performance in recent years. Despite these advances, employing machine learning models in high-stake domains remains challenging due to the opaqueness of many high-performance models. If their behavior cannot be analyzed, this likely decreases the trust in such models and hinders the acceptance of human decision-makers. Motivated by these challenges, we propose a process model for developing and evaluating explainable decision support systems that are tailored to the needs of different stakeholders. To demonstrate its usefulness, we apply the process model to a real-world application in an enterprise context. The goal is to increase the acceptance of an existing black-box model developed at a car manufacturer for supporting manual goodwill assessments. Following the proposed process, we conduct two quantitative surveys targeted at the application’s stakeholders. Our study reveals that textual explanations based on local feature importance best fit the needs of the stakeholders in the considered use case. Specifically, our results show that all stakeholders, including business specialists, goodwill assessors, and technical IT experts, agree that such explanations significantly increase their trust in the decision support system. Furthermore, our technical evaluation confirms the faithfulness and stability of the selected explanation method. These practical findings demonstrate the potential of our process model to facilitate the successful deployment of machine learning models in enterprise settings. The results emphasize the importance of developing explanations that are tailored to the specific needs and expectations of diverse stakeholders.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1274]

Z. Xiong, F. Zhang, Y. Wang, Y. Shi and X. Zhu.
EarthNets: Empowering artificial intelligence for Earth observation.
IEEE Geoscience and Remote Sensing Magazine Early Access (Oct. 2024). DOI GitHub

Abstract

Earth observation (EO), aiming at monitoring the state of planet Earth using remote sensing data, is critical for improving our daily lives and living environment. With a growing number of satellites in orbit, an increasing number of datasets with diverse sensors and research domains are being published to facilitate the research of the remote sensing community. This paper presents a comprehensive review of more than 500 publicly published datasets, including research domains like agriculture, land use and land cover, disaster monitoring, scene understanding, vision-language models, foundation models, climate change, and weather forecasting. We systematically analyze these EO datasets from four aspects: volume, resolution distributions, research domains, and the correlation between datasets. Based on the dataset attributes, we propose to measure, rank, and select datasets to build a new benchmark for model evaluation. Furthermore, a new platform for EO, termed EarthNets, is released to achieve a fair and consistent evaluation of deep learning methods on remote sensing data. EarthNets supports standard dataset libraries and cutting-edge deep learning models to bridge the gap between the remote sensing and machine learning communities. Based on this platform, extensive deep-learning methods are evaluated on the new benchmark. The insightful results are beneficial to future research.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1273]

M. M. Amin, R. Mao, E. Cambria and B. W. Schuller.
A Wide Evaluation of ChatGPT on Affective Computing Tasks.
IEEE Transactions on Affective Computing 15.4 (Oct. 2024). DOI

Abstract

With the rise of foundation models, a new artificial intelligence paradigm has emerged, by simply using general purpose foundation models with prompting to solve problems instead of training a separate machine learning model for each problem. Such models have been shown to have emergent properties of solving problems that they were not initially trained on. The studies for the effectiveness of such models are still quite limited. In this work, we widely study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13 affective computing problems, namely aspect extraction, aspect polarity classification, opinion extraction, sentiment analysis, sentiment intensity ranking, emotions intensity ranking, suicide tendency detection, toxicity detection, well-being assessment, engagement measurement, personality assessment, sarcasm detection, and subjectivity detection. We introduce a framework to evaluate the ChatGPT models on regression-based problems, such as intensity ranking problems, by modelling them as pairwise ranking classification. We compare ChatGPT against more traditional NLP methods, such as end-to-end recurrent neural networks and transformers. The results demonstrate the emergent abilities of the ChatGPT models on a wide range of affective computing problems, where GPT-3.5 and especially GPT-4 have shown strong performance on many problems, particularly the ones related to sentiment, emotions, or toxicity. The ChatGPT models fell short for problems with implicit signals, such as engagement measurement and subjectivity detection.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1272]

Y. Wang, C. M. Albrecht and X. Zhu.
Multilabel-Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining.
IEEE Transactions on Geoscience and Remote Sensing 62 (Oct. 2024). DOI GitHub

Abstract

Self-supervised pretraining on large-scale satellite data has raised great interest in building Earth observation (EO) foundation models. However, many important resources beyond pure satellite imagery, such as land-cover-land-use products that provide free global semantic information, as well as vision foundation models that hold strong knowledge of the natural world, are not widely studied. In this work, we show these free additional resources not only help resolve common contrastive learning bottlenecks but also significantly boost the efficiency and effectiveness of EO pretraining. Specifically, we first propose soft contrastive learning (SoftCon) that optimizes cross-scene soft similarity based on land-cover-generated multilabel supervision, naturally solving the issue of multiple positive samples and too strict positive matching in complex scenes. Second, we revisit and explore cross-domain continual pretraining for both multispectral and synthetic aperture radar (SAR) imagery, building efficient EO foundation models from strongest vision models such as DINOv2. Adapting simple weight-initialization and Siamese masking strategies into our SoftCon framework, we demonstrate impressive continual pretraining performance even when the input modalities are not aligned. Without prohibitive training, we produce multispectral and SAR foundation models that achieve significantly better results in 10 out of 11 downstream tasks than most existing SOTA models. For example, our ResNet50/ViT-S achieve 84.8/85.0 linear probing mAP scores on BigEarthNet-10%, which are better than most existing ViT-L models; under the same setting, our ViT-B sets a new record of 86.8 in multispectral, and 82.5 in SAR, the latter even better than many multispectral models.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1271]

M. M. Heimer, Y. Dikhtyar, B. F. Hoppe, F. L. Herr, A. T. Stüber, T. Burkard, E. Zöller, M. P. Fabritius, L. Unterrainer, L. Adams, A. Thurner, D. Kaufmann, T. Trzaska, M. Kopp, O. Hamer, K. Maurer, I. Ristow, M. S. May, A. Tufman, J. Spiro, M. Brendel, M. Ingrisch, J. Ricke and C. C. Cyran.
Software-assisted structured reporting and semi-automated TNM classification for NSCLC staging in a multicenter proof of concept study.
Insights into Imaging 15.258 (Oct. 2024). DOI

Abstract

In this multi-center study, we proposed a structured reporting (SR) framework for non-small cell lung cancer (NSCLC) and developed a software-assisted tool to automatically translate image-based findings and annotations into TNM classifications. The aim of this study was to validate the software-assisted SR tool for NSCLC, assess its potential clinical impact in a proof-of-concept study, and evaluate current reporting standards in participating institutions.

MCML Authors

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Theresa Stüber

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Clinical Data Science in Radiology

[1270]

M. Rauscher, A. Scagliotti and F. Pagginelli Patricio.
Shortest-path recovery from signature with an optimal control approach.
Mathematics of Control, Signals, and Systems (Oct. 2024). DOI

Abstract

In this paper, we consider the signature-to-path reconstruction problem from the control-theoretic perspective. Namely, we design an optimal control problem whose solution leads to the minimal-length path that generates a given signature. In order to do that, we minimize a cost functional consisting of two competing terms, i.e., a weighted final-time cost combined with the -norm squared of the controls. Moreover, we can show that, by taking the limit to infinity of the parameter that tunes the final-time cost, the problem -converges to the problem of finding a sub-Riemannian geodesic connecting two signatures. Finally, we provide an alternative reformulation of the latter problem, which is particularly suitable for the numerical implementation.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[1269]

S. Gatidis, M. Früh, M. P. Fabritius, S. Gu, K. Nikolaou, C. L. Fougère, J. Ye, J. He, Y. Peng, L. Bi, J. Ma, B. Wang, J. Zhang, Y. Huang, L. Heiliger, Z. Marinov, R. Stiefelhagen, J. Egger, J. Kleesiek, L. Sibille, L. Xiang, S. Bendazzoli, M. Astaraki, M. Ingrisch, C. C. Cyran and T. Küstner.
Results from the autoPET challenge on fully automated lesion segmentation in oncologic PET/CT imaging.
Nature Machine Intelligence 6 (Oct. 2024). DOI

Abstract

Automated detection of tumour lesions on positron emission tomography–computed tomography (PET/CT) image data is a clinically relevant but highly challenging task. Progress in this field has been hampered in the past owing to the lack of publicly available annotated data and limited availability of platforms for inter-institutional collaboration. Here we describe the results of the autoPET challenge, a biomedical image analysis challenge aimed to motivate research in the field of automated PET/CT image analysis. The challenge task was the automated segmentation of metabolically active tumour lesions on whole-body 18F-fluorodeoxyglucose PET/CT. Challenge participants had access to a large publicly available annotated PET/CT dataset for algorithm training. All algorithms submitted to the final challenge phase were based on deep learning methods, mostly using three-dimensional U-Net architectures. Submitted algorithms were evaluated on a private test set composed of 150 PET/CT studies from two institutions. An ensemble model of the highest-ranking algorithms achieved favourable performance compared with individual algorithms. Algorithm performance was dependent on the quality and quantity of data and on algorithm design choices, such as tailored post-processing of predicted segmentations. Future iterations of this challenge will focus on generalization and clinical translation.

MCML Authors

Michael Ingrisch

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Clinical Data Science in Radiology

[1268]

P. Scholl, M. Iskandar, S. Wolf, J. Lee, A. Bacho, A. Dietrich, A. Albu-Schäffer and G. Kutyniok.
Learning-based adaption of robotic friction models.
Robotics and Computer-Integrated Manufacturing 89 (Oct. 2024). DOI

Abstract

In the Fourth Industrial Revolution, wherein artificial intelligence and the automation of machines occupy a central role, the deployment of robots is indispensable. However, the manufacturing process using robots, especially in collaboration with humans, is highly intricate. In particular, modeling the friction torque in robotic joints is a longstanding problem due to the lack of a good mathematical description. This motivates the usage of data-driven methods in recent works. However, model-based and data-driven models often exhibit limitations in their ability to generalize beyond the specific dynamics they were trained on, as we demonstrate in this paper. To address this challenge, we introduce a novel approach based on residual learning, which aims to adapt an existing friction model to new dynamics using as little data as possible. We validate our approach by training a base neural network on a symmetric friction data set to learn an accurate relation between the velocity and the friction torque. Subsequently, to adapt to more complex asymmetric settings, we train a second network on a small dataset, focusing on predicting the residual of the initial network’s output. By combining the output of both networks in a suitable manner, our proposed estimator outperforms the conventional model-based approach, an extended LuGre model, and the base neural network significantly. Furthermore, we evaluate our method on trajectories involving external loads and still observe a substantial improvement, approximately 60%–70%, over the conventional approach. Our method does not rely on data with external load during training, eliminating the need for external torque sensors. This demonstrates the generalization capability of our approach, even with a small amount of data – less than a minute – enabling adaptation to diverse scenarios based on prior knowledge about friction in different settings.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1267]

G. Keeling, B. Lange, A. McCroskery, K. Pedersen, D. Weinberger and B. Zevenbergen.
Moral Imagination for Engineering Teams: The Technomoral Scenario.
The International Review of Information Ethics 34.1 (Oct. 2024). DOI

Abstract

‘Moral imagination’ is the capacity to register that one’s perspective on a decision-making situation is limited, and to imagine alternative perspectives that reveal new considerations or approaches. We have developed a Moral Imagination approach that aims to drive a culture of responsible innovation, ethical awareness, deliberation, decision-making, and commitment in organizations developing new technologies. We here present a case study that illustrates one key aspect of our approach – the technomoral scenario – as we have applied it in our work with product and engineering teams. Technomoral scenarios are fictional narratives that raise ethical issues surrounding the interaction between emerging technologies and society. Through facilitated roleplaying and discussion, participants are prompted to examine their own intentions, articulate justifications for actions, and consider the impact of decisions on various stakeholders. This process helps developers to reenvision their choices and responsibilities, ultimately contributing to a culture of responsible innovation.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[1266]

C. Kern, M. P. Kim and A. Zhou.
Multi-Accurate CATE is Robust to Unknown Covariate Shifts.
Transactions on Machine Learning Research (Oct. 2024). URL

Abstract

MCML Authors

Christoph Kern

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Social Data Science and AI Lab

[1265]

Q. Xu, Y. Shi, J. Bamber, C. Ouyang and X. Zhu.
Large-scale flood modeling and forecasting with FloodCast.
Water Research 264 (Oct. 2024). DOI

Abstract

Large-scale hydrodynamic models generally rely on fixed-resolution spatial grids and model parameters as well as incurring a high computational cost. This limits their ability to accurately forecast flood crests and issue time-critical hazard warnings. In this work, we build a fast, stable, accurate, resolution-invariant, and geometry-adaptive flood modeling and forecasting framework that can perform at large scales, namely FloodCast. The framework comprises two main modules: multi-satellite observation and hydrodynamic modeling. In the multi-satellite observation module, a real-time unsupervised change detection method and a rainfall processing and analysis tool are proposed to harness the full potential of multi-satellite observations in large-scale flood prediction. In the hydrodynamic modeling module, a geometry-adaptive physics-informed neural solver (GeoPINS) is introduced, benefiting from the absence of a requirement for training data in physics-informed neural networks (PINNs) and featuring a fast, accurate, and resolution-invariant architecture with Fourier neural operators. To adapt to complex river geometries, we reformulate PINNs in a geometry-adaptive space. GeoPINS demonstrates impressive performance on popular partial differential equations across regular and irregular domains. Building upon GeoPINS, we propose a sequence-to-sequence GeoPINS model to handle long-term temporal series and extensive spatial domains in large-scale flood modeling. This model employs sequence-to-sequence learning and hard-encoding of boundary conditions. Next, we establish a benchmark dataset in the 2022 Pakistan flood using a widely accepted finite difference numerical solution to assess various flood simulation methods. Finally, we validate the model in three dimensions - flood inundation range, depth, and transferability of spatiotemporal downscaling - utilizing SAR-based flood data, traditional hydrodynamic benchmarks, and concurrent optical remote sensing images. Traditional hydrodynamics and sequence-to-sequence GeoPINS exhibit exceptional agreement during high water levels, while comparative assessments with SAR-based flood depth data show that sequence-to-sequence GeoPINS outperforms traditional hydrodynamics, with smaller simulation errors. The experimental results for the 2022 Pakistan flood demonstrate that the proposed method enables high-precision, large-scale flood modeling with an average MAPE of 14.93% and an average Mean Absolute Error (MAE) of 0.0610 m for 14-day water depth simulations while facilitating real-time flood hazard forecasting using reliable precipitation data.

MCML Authors

Qingsong Xu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Data Science in Earth Observation

[1264]

O. Åström, C. Geldhauser, M. Grillitsch, O. Hall and A. Sopasakis.
Enhancing Carbon Emission Reduction Strategies using OCO and ICOS data.
Preprint (Oct. 2024). arXiv

Abstract

We propose a methodology to enhance local CO2 monitoring by integrating satellite data from the Orbiting Carbon Observatories (OCO-2 and OCO-3) with ground level observations from the Integrated Carbon Observation System (ICOS) and weather data from the ECMWF Reanalysis v5 (ERA5). Unlike traditional methods that downsample national data, our approach uses multimodal data fusion for high-resolution CO2 estimations. We employ weighted K-nearest neighbor (KNN) interpolation with machine learning models to predict ground level CO2 from satellite measurements, achieving a Root Mean Squared Error of 3.92 ppm. Our results show the effectiveness of integrating diverse data sources in capturing local emission patterns, highlighting the value of high-resolution atmospheric transport models. The developed model improves the granularity of CO2 monitoring, providing precise insights for targeted carbon mitigation strategies, and represents a novel application of neural networks and KNN in environmental monitoring, adaptable to various regions and temporal scales.

MCML Authors

Carina Geldhauser

Dr.

* Former Member

[1263]

V. Blaschke, B. Kovačić, S. Peng and B. Plank.
MaiBaam Annotation Guidelines.
Preprint (Oct. 2024). arXiv

Abstract

This document provides the annotation guidelines for MaiBaam, a Bavarian corpus manually annotated with part-of-speech (POS) tags, syntactic dependencies, and German lemmas. MaiBaam belongs to the Universal Dependencies (UD) project, and our annotations elaborate on the general and German UD version 2 guidelines. In this document, we detail how to preprocess and tokenize Bavarian data, provide an overview of the POS tags and dependencies we use, explain annotation decisions that would also apply to closely related languages like German, and lastly we introduce and motivate decisions that are specific to Bavarian grammar.

MCML Authors

Verena Blaschke

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Siyao Peng

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[1262]

Q. Chen, X. Wang, P. Mondorf, M. A. Hedderich and B. Plank.
Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination.
Preprint (Oct. 2024). arXiv

Abstract

Tree of Thoughts (ToT) is a reasoning strategy for Large Language Models (LLMs) that employs a generator to suggest reasoning steps and a discriminator to decide which steps to implement. ToT demonstrates strong performance on reasoning tasks, often surpassing simple methods such as Input-Output (IO) prompting and Chain-of-Thought (CoT) reasoning. However, ToT does not consistently outperform such simpler methods across all models, leaving large knowledge gaps on the conditions under which ToT is most beneficial. In this paper, we analyze the roles of the generator and discriminator separately to better understand the conditions when ToT is beneficial. We find that the generator plays a more critical role than the discriminator in driving the success of ToT. Scaling the generator leads to notable improvements in ToT performance, even when using a smaller model as the discriminator, whereas scaling the discriminator with a fixed generator yields only marginal gains. Our results show that models across different scales exhibit comparable discrimination capabilities, yet differ significantly in their generative performance for ToT.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Philipp Mondorf

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Michael Hedderich

Dr.

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

AI and Computational Linguistics

[1261]

L. Edman, L. Bylinina, F. Ghorbanpour and A. Fraser.
Are BabyLMs Second Language Learners?
Preprint (Oct. 2024). arXiv

Abstract

This paper describes a linguistically-motivated approach to the 2024 edition of the BabyLM Challenge (Warstadt et al. 2023). Rather than pursuing a first language learning (L1) paradigm, we approach the challenge from a second language (L2) learning perspective. In L2 learning, there is a stronger focus on learning explicit linguistic information, such as grammatical notions, definitions of words or different ways of expressing a meaning. This makes L2 learning potentially more efficient and concise. We approximate this using data from Wiktionary, grammar examples either generated by an LLM or sourced from grammar books, and paraphrase data. We find that explicit information about word meaning (in our case, Wiktionary) does not boost model performance, while grammatical information can give a small improvement. The most impactful data ingredient is sentence paraphrases, with our two best models being trained on 1) a mix of paraphrase data and data from the BabyLM pretraining dataset, and 2) exclusively paraphrase data.

MCML Authors

Lukas Edman

Dr.

Data Analytics & Statistics

Faeze Ghorbanpour

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Michael Hedderich

Data Analytics & Statistics

[1260]

F. Eichin, C. Schuster, G. Groh and M. A. Hedderich.
Semantic Component Analysis: Discovering Patterns in Short Texts Beyond Topics.
Preprint (Oct. 2024). arXiv

Abstract

Topic modeling is a key method in text analysis, but existing approaches are limited by assuming one topic per document or fail to scale efficiently for large, noisy datasets of short texts. We introduce Semantic Component Analysis (SCA), a novel topic modeling technique that overcomes these limitations by discovering multiple, nuanced semantic components beyond a single topic in short texts which we accomplish by introducing a decomposition step to the clustering-based topic modeling framework. Evaluated on multiple Twitter datasets, SCA matches the state-of-the-art method BERTopic in coherence and diversity, while uncovering at least double the semantic components and maintaining a noise rate close to zero while staying scalable and effective across languages, including an underrepresented one.

MCML Authors

Florian Eichin

AI and Computational Linguistics

Michael Hedderich

Dr.

AI and Computational Linguistics

[1259]

M. Fornasier, P. Heid and G. Sodini.
Approximation Theory, Computing, and Deep Learning on the Wasserstein Space.
Preprint (Oct. 2024). arXiv

Abstract

The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. We delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional’s Euler-Lagrange equation. We furnish explicit and quantitative bounds on generalization errors for each of these solutions. We leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude. This allows evaluations over large datasets several times faster, including training, than traditional optimal transport algorithms. Our analytically designed deep learning architecture slightly outperforms the test error of state-of-the-art CNN architectures on datasets of images.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Pascal Heid

Dr.

* Former Member

[1258]

H. Funk, R. Ludwig, H. Küchenhoff and T. Nagler.
Towards more realistic climate model outputs: A multivariate bias correction based on zero-inflated vine copulas.
Preprint (Oct. 2024). arXiv

Abstract

Climate model large ensembles are an essential research tool for analysing and quantifying natural climate variability and providing robust information for rare extreme events. The models simulated representations of reality are susceptible to bias due to incomplete understanding of physical processes. This paper aims to correct the bias of five climate variables from the CRCM5 Large Ensemble over Central Europe at a 3-hourly temporal resolution. At this high temporal resolution, two variables, precipitation and radiation, exhibit a high share of zero inflation. We propose a novel bias-correction method, VBC (Vine copula bias correction), that models and transfers multivariate dependence structures for zero-inflated margins in the data from its error-prone model domain to a reference domain. VBC estimates the model and reference distribution using vine copulas and corrects the model distribution via (inverse) Rosenblatt transformation. To deal with the variables’ zero-inflated nature, we develop a new vine density decomposition that accommodates such variables and employs an adequately randomized version of the Rosenblatt transform. This novel approach allows for more accurate modelling of multivariate zero-inflated climate data. Compared with state-of-the-art correction methods, VBC is generally the best-performing correction and the most accurate method for correcting zero-inflated events.

MCML Authors

Henri Funk

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Thomas Nagler

Prof. Dr.

B3 | Multimodal Perception
→ Group Matthias Althoff

Computational Statistics & Data Science

[1257]

P. Gassert and M. Althoff.
Stepping Out of the Shadows: Reinforcement Learning in Shadow Mode.
Preprint (Oct. 2024). arXiv

Abstract

Reinforcement learning (RL) is not yet competitive for many cyber-physical systems, such as robotics, process automation, and power systems, as training on a system with physical components cannot be accelerated, and simulation models do not exist or suffer from a large simulation-to-reality gap. During the long training time, expensive equipment cannot be used and might even be damaged due to inappropriate actions of the reinforcement learning agent. Our novel approach addresses exactly this problem: We train the reinforcement agent in a so-called shadow mode with the assistance of an existing conventional controller, which does not have to be trained and instantaneously performs reasonably well. In shadow mode, the agent relies on the controller to provide action samples and guidance towards favourable states to learn the task, while simultaneously estimating for which states the learned agent will receive a higher reward than the conventional controller. The RL agent will then control the system for these states and all other regions remain under the control of the existing controller. Over time, the RL agent will take over for an increasing amount of states, while leaving control to the baseline, where it cannot surpass its performance. Thus, we keep regret during training low and improve the performance compared to only using conventional controllers or reinforcement learning. We present and evaluate two mechanisms for deciding whether to use the RL agent or the conventional controller. The usefulness of our approach is demonstrated for a reach-avoid task, for which we are able to effectively train an agent, where standard approaches fail.

MCML Authors

Philipp Gassert

Cyber Physical Systems

Matthias Althoff

Prof. Dr.

Cyber Physical Systems

[1256]

K. Gatmiry, Z. Li, S. J. Reddi and S. Jegelka.
Simplicity Bias via Global Convergence of Sharpness Minimization.
Preprint (Oct. 2024). arXiv

Abstract

The remarkable generalization ability of neural networks is usually attributed to the implicit bias of SGD, which often yields models with lower complexity using simpler (e.g. linear) and low-rank features. Recent works have provided empirical and theoretical evidence for the bias of particular variants of SGD (such as label noise SGD) toward flatter regions of the loss landscape. Despite the folklore intuition that flat solutions are ‘simple’, the connection with the simplicity of the final trained model (e.g. low-rank) is not well understood. In this work, we take a step toward bridging this gap by studying the simplicity structure that arises from minimizers of the sharpness for a class of two-layer neural networks. We show that, for any high dimensional training data and certain activations, with small enough step size, label noise SGD always converges to a network that replicates a single linear feature across all neurons; thereby, implying a simple rank one feature matrix. To obtain this result, our main technical contribution is to show that label noise SGD always minimizes the sharpness on the manifold of models with zero loss for two-layer networks. Along the way, we discover a novel property – a local geodesic convexity – of the trace of Hessian of the loss at approximate stationary points on the manifold of zero loss, which links sharpness to the geometry of the manifold. This tool may be of independent interest.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1255]

K. Gatmiry, N. Saunshi, S. J. Reddi, S. Jegelka and S. Kumar.
On the Role of Depth and Looping for In-Context Learning with Task Diversity.
Preprint (Oct. 2024). arXiv

Abstract

The intriguing in-context learning (ICL) abilities of deep Transformer models have lately garnered significant attention. By studying in-context linear regression on unimodal Gaussian data, recent empirical and theoretical works have argued that ICL emerges from Transformers’ abilities to simulate learning algorithms like gradient descent. However, these works fail to capture the remarkable ability of Transformers to learn multiple tasks in context. To this end, we study in-context learning for linear regression with diverse tasks, characterized by data covariance matrices with condition numbers ranging from [1,κ], and highlight the importance of depth in this setting. More specifically, (a) we show theoretical lower bounds of log(κ) (or κ√) linear attention layers in the unrestricted (or restricted) attention setting and, (b) we show that multilayer Transformers can indeed solve such tasks with a number of layers that matches the lower bounds. However, we show that this expressivity of multilayer Transformer comes at the price of robustness. In particular, multilayer Transformers are not robust to even distributional shifts as small as O(e−L) in Wasserstein distance, where L is the depth of the network. We then demonstrate that Looped Transformers – a special class of multilayer Transformers with weight-sharing – not only exhibit similar expressive power but are also provably robust under mild assumptions. Besides out-of-distribution generalization, we also show that Looped Transformers are the only models that exhibit a monotonic behavior of loss with respect to depth.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1254]

S. Karthik, H. Coskun, Z. Akata, S. Tulyakov, J. Ren and A. Kag.
Scalable Ranked Preference Optimization for Text-to-Image Generation.
Preprint (Oct. 2024). arXiv

Abstract

Direct Preference Optimization (DPO) has emerged as a powerful approach to align text-to-image (T2I) models with human feedback. Unfortunately, successful application of DPO to T2I models requires a huge amount of resources to collect and label large-scale datasets, e.g., millions of generated paired images annotated with human preferences. In addition, these human preference datasets can get outdated quickly as the rapid improvements of T2I models lead to higher quality images. In this work, we investigate a scalable approach for collecting large-scale and fully synthetic datasets for DPO training. Specifically, the preferences for paired images are generated using a pre-trained reward function, eliminating the need for involving humans in the annotation process, greatly improving the dataset collection efficiency. Moreover, we demonstrate that such datasets allow averaging predictions across multiple models and collecting ranked preferences as opposed to pairwise preferences. Furthermore, we introduce RankDPO to enhance DPO-based methods using the ranking feedback. Applying RankDPO on SDXL and SD3-Medium models with our synthetically generated preference dataset ‘Syn-Pic’ improves both prompt-following (on benchmarks like T2I-Compbench, GenEval, and DPG-Bench) and visual quality (through user studies). This pipeline presents a practical and scalable solution to develop better preference datasets to enhance the performance of text-to-image models.

MCML Authors

Shyamgopal Karthik

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1253]

Y. Ozyurt, S. Feuerriegel and M. Sachan.
Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing.
Preprint (Oct. 2024). arXiv

Abstract

Knowledge tracing (KT) is a popular approach for modeling students’ learning progress over time, which can enable more personalized and adaptive learning. However, existing KT approaches face two major limitations: (1) they rely heavily on expert-defined knowledge concepts (KCs) in questions, which is time-consuming and prone to errors; and (2) KT methods tend to overlook the semantics of both questions and the given KCs. In this work, we address these challenges and present KCQRL, a framework for automated knowledge concept annotation and question representation learning that can improve the effectiveness of any existing KT model. First, we propose an automated KC annotation process using large language models (LLMs), which generates question solutions and then annotates KCs in each solution step of the questions. Second, we introduce a contrastive learning approach to generate semantically rich embeddings for questions and solution steps, aligning them with their associated KCs via a tailored false negative elimination approach. These embeddings can be readily integrated into existing KT models, replacing their randomly initialized embeddings. We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets, where we achieve consistent performance improvements.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1252]

Y. Ozyurt, S. Feuerriegel and C. Zhang.
Document-Level In-Context Few-Shot Relation Extraction via Pre-Trained Language Models.
Preprint (Oct. 2024). arXiv

Abstract

Document-level relation extraction aims at inferring structured human knowledge from textual documents. State-of-the-art methods for this task use pre-trained language models (LMs) via fine-tuning, yet fine-tuning is computationally expensive and cannot adapt to new relation types or new LMs. As a remedy, we leverage the generalization capabilities of pre-trained LMs and present a novel framework for document-level in-context few-shot relation extraction. Our framework has three strengths: it eliminates the need (1) for named entity recognition and (2) for human annotations of documents, and (3) it can be updated to new LMs without re-training. We evaluate our framework using DocRED, the largest publicly available dataset for document-level relation extraction, and demonstrate that our framework achieves state-of-the-art performance. We further show that our framework actually performs much better than the original labels from the development set of DocRED. Finally, we conduct an extensive benchmark demonstrating the effectiveness of our framework, achieving state-of-the-art results across six relation extraction datasets and outperforming more than 30 baseline methods. Unlike our framework, the baseline methods have large computational overhead (e.g., from fine-tuning). To the best of our knowledge, we are the first to reformulate the document-level relation extraction task as a tailored in-context few-shot learning paradigm.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence in Management

[1251]

T. Pielok, B. Bischl and D. Rügamer.
Semi-Implicit Variational Inference via Kernelized Path Gradient Descent.
Preprint (Oct. 2024). arXiv

Abstract

Semi-implicit variational inference (SIVI) is a powerful framework for approximating complex posterior distributions, but training with the Kullback-Leibler (KL) divergence can be challenging due to high variance and bias in high-dimensional settings. While current state-of-the-art semi-implicit variational inference methods, particularly Kernel Semi-Implicit Variational Inference (KSIVI), have been shown to work in high dimensions, training remains moderately expensive. In this work, we propose a kernelized KL divergence estimator that stabilizes training through nonparametric smoothing. To further reduce the bias, we introduce an importance sampling correction. We provide a theoretical connection to the amortized version of the Stein variational gradient descent, which estimates the score gradient via Stein’s identity, showing that both methods minimize the same objective, but our semi-implicit approach achieves lower gradient variance. In addition, our method’s bias in function space is benign, leading to more stable and efficient optimization. Empirical results demonstrate that our method outperforms or matches state-of-the-art SIVI methods in both performance and training efficiency.

MCML Authors

Tobias Pielok

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1250]

T. Putterman, D. Lim, Y. Gelberg, S. Jegelka and H. Maron.
Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models.
Preprint (Oct. 2024). arXiv

Abstract

Low-rank adaptations (LoRAs) have revolutionized the finetuning of large foundation models, enabling efficient adaptation even with limited computational resources. The resulting proliferation of LoRAs presents exciting opportunities for applying machine learning techniques that take these low-rank weights themselves as inputs. In this paper, we investigate the potential of Learning on LoRAs (LoL), a paradigm where LoRA weights serve as input to machine learning models. For instance, an LoL model that takes in LoRA weights as inputs could predict the performance of the finetuned model on downstream tasks, detect potentially harmful finetunes, or even generate novel model edits without traditional training methods. We first identify the inherent parameter symmetries of low rank decompositions of weights, which differ significantly from the parameter symmetries of standard neural networks. To efficiently process LoRA weights, we develop several symmetry-aware invariant or equivariant LoL models, using tools such as canonicalization, invariant featurization, and equivariant layers. We finetune thousands of text-to-image diffusion models and language models to collect datasets of LoRAs. In numerical experiments on these datasets, we show that our LoL architectures are capable of processing low rank weight decompositions to predict CLIP score, finetuning data attributes, finetuning data membership, and accuracy on downstream tasks.

MCML Authors

Stefanie Jegelka

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Foundations of Deep Neural Networks

[1249]

P. Scholl, A. Bacho, H. Boche and G. Kutyniok.
Symbolic Recovery of Differential Equations: The Identifiability Problem.
Preprint (Oct. 2024). arXiv

Abstract

Symbolic recovery of differential equations is the ambitious attempt at automating the derivation of governing equations with the use of machine learning techniques. In contrast to classical methods which assume the structure of the equation to be known and focus on the estimation of specific parameters, these algorithms aim to learn the structure and the parameters simultaneously. While the uniqueness and, therefore, the identifiability of parameters of governing equations are a well-addressed problem in the field of parameter estimation, it has not been investigated for symbolic recovery. However, this problem should be even more present in this field since the algorithms aim to cover larger spaces of governing equations. In this paper, we investigate under which conditions a solution of a differential equation does not uniquely determine the equation itself. For various classes of differential equations, we provide both necessary and sufficient conditions for a function to uniquely determine the corresponding differential equation. We then use our results to devise numerical algorithms aiming to determine whether a function solves a differential equation uniquely. Finally, we provide extensive numerical experiments showing that our algorithms can indeed guarantee the uniqueness of the learned governing differential equation, without assuming any knowledge about the analytic form of function, thereby ensuring the reliability of the learned equation.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1248]

T. Schwarz, C. Casolo and N. Kilbertus.
Uncertainty-Aware Optimal Treatment Selection for Clinical Time Series.
Preprint (Oct. 2024). arXiv

Abstract

In personalized medicine, the ability to predict and optimize treatment outcomes across various time frames is essential. Additionally, the ability to select cost-effective treatments within specific budget constraints is critical. Despite recent advancements in estimating counterfactual trajectories, a direct link to optimal treatment selection based on these estimates is missing. This paper introduces a novel method integrating counterfactual estimation techniques and uncertainty quantification to recommend personalized treatment plans adhering to predefined cost constraints. Our approach is distinctive in its handling of continuous treatment variables and its incorporation of uncertainty quantification to improve prediction reliability. We validate our method using two simulated datasets, one focused on the cardiovascular system and the other on COVID-19. Our findings indicate that our method has robust performance across different counterfactual estimation baselines, showing that introducing uncertainty quantification in these settings helps the current baselines in finding more reliable and accurate treatment selection. The robustness of our method across various settings highlights its potential for broad applicability in personalized healthcare solutions.

MCML Authors

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[1247]

Y. Sun, Z. Wu, Y. Ma and V. Tresp.
Quantum Architecture Search with Unsupervised Representation Learning.
Preprint (Oct. 2024). arXiv

Abstract

Unsupervised representation learning presents new opportunities for advancing Quantum Architecture Search (QAS) on Noisy Intermediate-Scale Quantum (NISQ) devices. QAS is designed to optimize quantum circuits for Variational Quantum Algorithms (VQAs). Most QAS algorithms tightly couple the search space and search algorithm, typically requiring the evaluation of numerous quantum circuits, resulting in high computational costs and limiting scalability to larger quantum circuits. Predictor-based QAS algorithms mitigate this issue by estimating circuit performance based on structure or embedding. However, these methods often demand time-intensive labeling to optimize gate parameters across many circuits, which is crucial for training accurate predictors. Inspired by the classical neural architecture search algorithm Arch2vec, we investigate the potential of unsupervised representation learning for QAS without relying on predictors. Our framework decouples unsupervised architecture representation learning from the search process, enabling the learned representations to be applied across various downstream tasks. Additionally, it integrates an improved quantum circuit graph encoding scheme, addressing the limitations of existing representations and enhancing search efficiency. This predictor-free approach removes the need for large labeled datasets. During the search, we employ REINFORCE and Bayesian Optimization to explore the latent representation space and compare their performance against baseline methods. Our results demonstrate that the framework efficiently identifies high-performing quantum circuits with fewer search iterations.

MCML Authors

Yize Sun

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1246]

A. White, A. Büttner, M. Gelbrecht, V. Duruisseaux, N. Kilbertus, F. Hellmann and N. Boers.
Projected Neural Differential Equations for Learning Constrained Dynamics.
Preprint (Oct. 2024). arXiv

Abstract

Neural differential equations offer a powerful approach for learning dynamics from data. However, they do not impose known constraints that should be obeyed by the learned model. It is well-known that enforcing constraints in surrogate models can enhance their generalizability and numerical stability. In this paper, we introduce projected neural differential equations (PNDEs), a new method for constraining neural differential equations based on projection of the learned vector field to the tangent space of the constraint manifold. In tests on several challenging examples, including chaotic dynamical systems and state-of-the-art power grid models, PNDEs outperform existing methods while requiring fewer hyperparameters. The proposed approach demonstrates significant potential for enhancing the modeling of constrained dynamical systems, particularly in complex domains where accuracy and reliability are essential.

MCML Authors

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[1245]

M. Yau, E. Akyürek, J. Mao, J. B. Tenenbaum, S. Jegelka and J. Andreas.
Learning Linear Attention in Polynomial Time.
Preprint (Oct. 2024). arXiv

Abstract

Previous research has explored the computational expressivity of Transformer models in simulating Boolean circuits or Turing machines. However, the learnability of these simulators from observational data has remained an open question. Our study addresses this gap by providing the first polynomial-time learnability results (specifically strong, agnostic PAC learning) for single-layer Transformers with linear attention. We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS. As a consequence, the problem of learning any linear transformer may be converted into the problem of learning an ordinary linear predictor in an expanded feature space, and any such predictor may be converted back into a multiheaded linear transformer. Moving to generalization, we show how to efficiently identify training datasets for which every empirical risk minimizer is equivalent (up to trivial symmetries) to the linear Transformer that generated the data, thereby guaranteeing the learned model will correctly generalize across all inputs. Finally, we provide examples of computations expressible via linear attention and therefore polynomial-time learnable, including associative memories, finite automata, and a class of Universal Turing Machine (UTMs) with polynomially bounded computation histories. We empirically validate our theoretical findings on three tasks: learning random linear attention networks, key–value associations, and learning to execute finite automata. Our findings bridge a critical gap between theoretical expressivity and learnability of Transformers, and show that flexible and general models of computation are efficiently learnable.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[1244]

F. Chiossi, U. Gruenefeld, B. J. Hou, J. Newn, C. Ou, R. Liao, R. Welsch and S. Mayer.
Understanding the Impact of the Reality-Virtuality Continuum on Visual Search Using Fixation-Related Potentials and Eye Tracking Features.
MobileHCI 2024 - ACM International Conference on Mobile Human-Computer Interaction. Melbourne, Australia, Sep 30-Oct 03, 2024. DOI

Abstract

While Mixed Reality allows the seamless blending of digital content in users’ surroundings, it is unclear if its fusion with physical information impacts users’ perceptual and cognitive resources differently. While the fusion of digital and physical objects provides numerous opportunities to present additional information, it also introduces undesirable side effects, such as split attention and increased visual complexity. We conducted a visual search study in three manifestations of mixed reality (Augmented Reality, Augmented Virtuality, Virtual Reality) to understand the effects of the environment on visual search behavior. We conducted a multimodal evaluation measuring Fixation-Related Potentials (FRPs), alongside eye tracking to assess search efficiency, attention allocation, and behavioral measures. Our findings indicate distinct patterns in FRPs and eye-tracking data that reflect varying cognitive demands across environments. Specifically, AR environments were associated with increased workload, as indicated by decreased FRP - P3 amplitudes and more scattered eye movement patterns, impairing users’ ability to identify target information efficiently. Participants reported AR as the most demanding and distracting environment. These insights inform design implications for MR adaptive systems, emphasizing the need for interfaces that dynamically respond to user cognitive load based on physiological inputs.

MCML Authors

Sven Mayer

Prof. Dr.

* Former Member

[1243]

J. W. Grootjen, P. Thallhammer and T. Kosch.
Your Eyes on Speed: Using Pupil Dilation to Adaptively Select Speed-Reading Parameters in Virtual Reality.
MobileHCI 2024 - ACM International Conference on Mobile Human-Computer Interaction. Melbourne, Australia, Sep 30-Oct 03, 2024. DOI GitHub

Abstract

Rapid Serial Visual Presentation (RSVP) improves the reading speed for optimizing the user’s information processing capabilities on Virtual Reality (VR) devices. Yet, the user’s RSVP reading performance changes over time while the reading speed remains static. In this paper, we evaluate pupil dilation as a physiological metric to assess the mental workload of readers in real-time. We assess mental workload under different background lighting and RSVP presentation speeds to estimate the optimal color that discriminates the pupil diameter varying RSVP presentation speeds. We discovered that a gray background provides the best contrast for reading at various presentation speeds. Then, we conducted a second study to evaluate the classification accuracy of mental workload for different presentation speeds. We find that pupil dilation relates to mental workload when reading with RSVP. We discuss how pupil dilation can be used to adapt the RSVP speed in future VR applications to optimize information intake.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

[1242]

Y. Weiss, S. Villa, J. W. Grootjen, M. Hoppe, Y. Kale and F. Müller.
Exploring Redirection and Shifting Techniques to Mask Hand Movements from Shoulder-Surfing Attacks during PIN Authentication in Virtual Reality.
MobileHCI 2024 - ACM International Conference on Mobile Human-Computer Interaction. Melbourne, Australia, Sep 30-Oct 03, 2024. DOI

Abstract

The proliferation of mobile Virtual Reality (VR) headsets shifts our interaction with virtual worlds beyond our living rooms into shared spaces. Consequently, we are entrusting more and more personal data to these devices, calling for strong security measures and authentication. However, the standard authentication method of such devices - entering PINs via virtual keyboards - is vulnerable to shoulder-surfing, as movements to enter keys can be monitored by an unnoticed observer. To address this, we evaluated masking techniques to obscure VR users’ input during PIN authentication by diverting their hand movements. Through two experimental studies, we demonstrate that these methods increase users’ security against shoulder-surfing attacks from observers without excessively impacting their experience and performance. With these discoveries, we aim to enhance the security of future VR authentication without disrupting the virtual experience or necessitating additional hardware or training of users.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

[1241]

M. Windl, M. Schlegel and S. Mayer.
Exploring Users’ Mental Models and Privacy Concerns During Interconnected Interactions.
MobileHCI 2024 - ACM International Conference on Mobile Human-Computer Interaction. Melbourne, Australia, Sep 30-Oct 03, 2024. DOI

Abstract

Users frequently use their smartphones in combination with other smart devices, for example, when streaming music to smart speakers or controlling smart appliances. During these interconnected interactions, user data gets handled and processed by several entities that employ different data protection practices or are subject to different regulations. Users need to understand these processes to inform themselves in the right places and make informed privacy decisions. We conducted an online survey (N=120) to investigate whether users have accurate mental models about interconnected interactions. We found that users consider scenarios more privacy-concerning when multiple devices are involved. Yet, we also found that most users do not fully comprehend the privacy-relevant processes in interconnected interactions. Our results show that current privacy information methods are insufficient and that users must be better educated to make informed privacy decisions. Finally, we advocate for restricting data processing to the app layer and better encryption to reduce users’ data protection responsibilities.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[1240]

H. Cao, Z. Zhang, Y. Xia, X. Li, J. Xia, G. Chen and A. Knoll.
Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

In frame-based vision, object detection faces substantial performance degradation under challenging conditions due to the limited sensing capability of conventional cameras. Event cameras output sparse and asynchronous events, providing a potential solution to solve these problems. However, effectively fusing two heterogeneous modalities remains an open issue. In this work, we propose a novel hierarchical feature refinement network for event-frame fusion. The core concept is the design of the coarse-to-fine fusion module, denoted as the cross-modality adaptive feature refinement (CAFR) module. In the initial phase, the bidirectional cross-modality interaction (BCI) part facilitates information bridging from two distinct sources. Subsequently, the features are further refined by aligning the channel-level mean and variance in the two-fold adaptive feature refinement (TAFR) part. We conducted extensive experiments on two benchmarks: the low-resolution PKU-DDD17-Car dataset and the high-resolution DSEC dataset. Experimental results show that our method surpasses the state-of-the-art by an impressive margin of 8% on the DSEC dataset. Besides, our method exhibits significantly better robustness (69.5% versus 38.7%) when introducing 15 different corruption types to the frame images.

MCML Authors

Yan Xia

Dr.

* Former Member

[1239]

A. Christensen, N. Mojab, K. Patel, K. Ahuja, Z. Akata, O. Winther, O. Gonzalez-Franco and A. Colaco.
Geometry Fidelity for Spherical Images.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

Spherical or omni-directional images offer an immersive visual format appealing to a wide range of computer vision applications. However, geometric properties of spherical images pose a major challenge for models and metrics designed for ordinary 2D images. Here, we show that direct application of Fréchet Inception Distance (FID) is insufficient for quantifying geometric fidelity in spherical images. We introduce two quantitative metrics accounting for geometric constraints, namely Omnidirectional FID (OmniFID) and Discontinuity Score (DS). OmniFID is an extension of FID tailored to additionally capture field-of-view requirements of the spherical format by leveraging cubemap projections. DS is a kernel-based seam alignment score of continuity across borders of 2D representations of spherical images. In experiments, OmniFID and DS quantify geometry fidelity issues that are undetected by FID.

MCML Authors

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1238]

T. Hannan, M. M. Islam, T. Seidl and G. Bertasius.
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Locating specific moments within long videos (20–120 min) presents a significant challenge, akin to finding a needle in a haystack. Adapting existing short video (5–30 s) grounding methods to this problem yields poor performance. Since most real-life videos, such as those on YouTube and AR/VR, are lengthy, addressing this issue is crucial. Existing methods typically operate in two stages: clip retrieval and grounding. However, this disjoint process limits the retrieval module’s fine-grained event understanding, crucial for specific moment detection. We propose RGNet which deeply integrates clip retrieval and grounding into a single network capable of processing long videos into multiple granular levels, e.g., clips and frames. Its core component is a novel transformer encoder, RG-Encoder, that unifies the two stages through shared features and mutual optimization. The encoder incorporates a sparse attention mechanism and an attention loss to model both granularity jointly. Moreover, we introduce a contrastive clip sampling technique to mimic the long video paradigm closely during training. RGNet surpasses prior methods, showcasing state-of-the-art performance on long video temporal grounding (LVTG) datasets MAD and Ego4D.

MCML Authors

Tanveer Hannan

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[1237]

L. Härenstam-Nielsen, L. Sang, A. Saroha, N. Araslanov and D. Cremers.
DiffCD: A Symmetric Differentiable Chamfer Distance for Neural Implicit Surface Fitting.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Neural implicit surfaces can be used to recover accurate 3D geometry from imperfect point clouds. In this work, we show that state-of-the-art techniques work by minimizing an approximation of a one-sided Chamfer distance. This shape metric is not symmetric, as it only ensures that the point cloud is near the surface but not vice versa. As a consequence, existing methods can produce inaccurate reconstructions with spurious surfaces. Although one approach against spurious surfaces has been widely used in the literature, we theoretically and experimentally show that it is equivalent to regularizing the surface area, resulting in over-smoothing. As a more appealing alternative, we propose DiffCD, a novel loss function corresponding to the symmetric Chamfer distance. In contrast to previous work, DiffCD also assures that the surface is near the point cloud, which eliminates spurious surfaces without the need for additional regularization. We experimentally show that DiffCD reliably recovers a high degree of shape detail, substantially outperforming existing work across varying surface complexity and noise levels.

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Computer Vision & Artificial Intelligence

[1236]

F. Hoppe, C. M. Verdun, H. Laus, S. Endt, M. I. Menzel, F. Krahmer and H. Rauhut.
Imaging with Confidence: Uncertainty Quantification for High-dimensional Undersampled MR Images.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Establishing certified uncertainty quantification (UQ) in imaging processing applications continues to pose a significant challenge. In particular, such a goal is crucial for accurate and reliable medical imaging if one aims for precise diagnostics and appropriate intervention. In the case of magnetic resonance imaging, one of the essential tools of modern medicine, enormous advancements in fast image acquisition were possible after the introduction of compressive sensing and, more recently, deep learning methods. Still, as of now, there is no UQ method that is both fully rigorous and scalable. This work takes a step towards closing this gap by proposing a total variation minimization-based method for pixel-wise sharp confidence intervals for undersampled MRI. We demonstrate that our method empirically achieves the predicted confidence levels. We expect that our approach will also have implications for other imaging modalities as well as deep learning applications in computer vision.

MCML Authors

Claudio Mayrink Verdun

Dr.

* Former Member

Hannah Laus

A2 | Mathematical Foundations
→ Group Felix Krahmer

Optimization & Data Analysis

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[1235]

V. T. Hu, S. A. Baumann, M. Gui, O. Grebenkova, P. Ma, J. Fischer and B. Ommer.
ZigMa: A DiT-style Zigzag Mamba Diffusion Model.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of a State-Space Model called Mamba to extend its applicability to visual data generation. Firstly, we identify a critical oversight in most current Mamba-based vision methods, namely the lack of consideration for spatial continuity in the scan scheme of Mamba. Secondly, building upon this insight, we introduce Zigzag Mamba, a simple, plug-and-play, minimal-parameter burden, DiT style solution, which outperforms Mamba-based baselines and demonstrates improved speed and memory utilization compared to transformer-based baselines, also this heterogeneous layerwise scan enables zero memory and speed burden when we consider more scan paths. Lastly, we integrate Zigzag Mamba with the Stochastic Interpolant framework to investigate the scalability of the model on large-resolution visual datasets, such as FacesHQ and UCF101, MultiModal-CelebA-HQ, and MS COCO .

MCML Authors

Vincent Tao Hu

Dr.

Computer Vision & Learning

Olga Grebenkova

Computer Vision & Learning

Pingchuan Ma

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1234]

W. Huang, Y. Shi, Z. Xiong and X. Zhu.
Representation Enhancement-Stabilization: Reducing Bias-Variance of Domain Generalization.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Domain Generalization (DG) focuses on enhancing the generalization of deep learning models trained on multiple source domains to adapt to unseen target domains. This paper explores DG through the lens of bias-variance decomposition, uncovering that test errors in DG predominantly arise from cross-domain bias and variance. Inspired by this insight, we introduce a Representation Enhancement-Stabilization (RES) framework, comprising a Representation Enhancement (RE) module and a Representation Stabilization (RS) module. In RE, a novel set of feature frequency augmentation techniques is used to progressively reduce cross-domain bias during feature extraction. Furthermore, in RS, a novel Mutual Exponential Moving Average (MEMA) strategy is designed to stabilize model optimization for diminishing cross-domain variance during training. Collectively, the whole RES method can significantly enhance model generalization. We evaluate RES on five benchmark datasets and the results show that it outperforms multiple advanced DG methods.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1233]

T. Hummel, S. Karthik, M.-I. Georgescu and Z. Akata.
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

In Composed Video Retrieval, a video and a textual description which modifies the video content are provided as inputs to the model. The aim is to retrieve the relevant video with the modified content from a database of videos. In this challenging task, the first step is to acquire large-scale training datasets and collect high-quality benchmarks for evaluation. In this work, we introduce EgoCVR, a new evaluation benchmark for fine-grained Composed Video Retrieval using large-scale egocentric video datasets. EgoCVR consists of 2,295 queries that specifically focus on high-quality temporal video understanding. We find that existing Composed Video Retrieval frameworks do not achieve the necessary high-quality temporal video understanding for this task. To address this shortcoming, we adapt a simple training-free method, propose a generic re-ranking framework for Composed Video Retrieval, and demonstrate that this achieves strong results on EgoCVR.

MCML Authors

Shyamgopal Karthik

Interpretable and Reliable Machine Learning

Iuliana Georgescu

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1232]

J. M. Kim, J. Bader, S. Alaniz, C. Schmid and Z. Akata.
DataDream: Few-shot Guided Dataset Generation.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

While text-to-image diffusion models have been shown to achieve state-of-the-art results in image synthesis, they have yet to prove their effectiveness in downstream applications. Previous work has proposed to generate data for image classifier training given limited real data access. However, these methods struggle to generate in-distribution images or depict fine-grained features, thereby hindering the generalization of classification models trained on synthetic datasets. We propose DataDream, a framework for synthesizing classification datasets that more faithfully represents the real data distribution when guided by few-shot examples of the target classes. DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model. We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets. We demonstrate the efficacy of DataDream through extensive experiments, surpassing state-of-the-art classification accuracy with few-shot data across 7 out of 10 datasets, while being competitive on the other 3. Additionally, we provide insights into the impact of various factors, such as the number of real-shot and generated images as well as the fine-tuning compute on model performance.

MCML Authors

Jae Myung Kim

Interpretable and Reliable Machine Learning

Jessica Bader

Interpretable and Reliable Machine Learning

Stephan Alaniz

Dr.

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1231]

D. Kotovenko, O. Grebenkova, N. Sarafianos, A. Paliwal, P. Ma, O. Poursaeed, S. Mohan, Y. Fan, Y. Li, R. Ranjan and B. Ommer.
WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

While style transfer techniques have been well-developed for 2D image stylization, the extension of these methods to 3D scenes remains relatively unexplored. Existing approaches demonstrate proficiency in transferring colors and textures but often struggle with replicating the geometry of the scenes. In our work, we leverage an explicit Gaussian Scale (GS) representation and directly match the distributions of Gaussians between style and content scenes using the Earth Mover’s Distance (EMD). By employing the entropy-regularized Wasserstein-2 distance, we ensure that the transformation maintains spatial smoothness. Additionally, we decompose the scene stylization problem into smaller chunks to enhance efficiency. This paradigm shift reframes stylization from a pure generative process driven by latent space losses to an explicit matching of distributions between two Gaussian representations. Our method achieves high-resolution 3D stylization by faithfully transferring details from 3D style scenes onto the content scene. Furthermore, WaSt-3D consistently delivers results across diverse content and style scenes without necessitating any training, as it relies solely on optimization-based techniques.

MCML Authors

Olga Grebenkova

Computer Vision & Learning

Pingchuan Ma

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1230]

B. Liao, Z. Zhao, L. Chen, H. Li, D. Cremers and P. Liu.
GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Plane adjustment (PA) is crucial for many 3D applications, involving simultaneous pose estimation and plane recovery. Despite recent advancements, it remains a challenging problem in the realm of multi-view point cloud registration. Current state-of-the-art methods can achieve globally optimal convergence only with good initialization. Furthermore, their high time complexity renders them impractical for large-scale problems. To address these challenges, we first exploit a novel optimization strategy termed Bi-Convex Relaxation, which decouples the original problem into two simpler sub-problems, reformulates each sub-problem using a convex relaxation technique, and alternately solves each one until the original problem converges. Building on this strategy, we propose two algorithmic variants for solving the plane adjustment problem, namely GlobalPointer and GlobalPointer++, based on point-to-plane and plane-to-plane errors, respectively. Extensive experiments on both synthetic and real datasets demonstrate that our method can perform large-scale plane adjustment with linear time complexity, larger convergence region, and robustness to poor initialization, while achieving similar accuracy as prior methods.

MCML Authors

Haoang Li

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1229]

M. Mahajan, F. Hofherr and D. Cremers.
MeshFeat: Multi-Resolution Features for Neural Fields on Meshes.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

Parametric feature grid encodings have gained significant attention as an encoding approach for neural fields since they allow for much smaller MLPs, which significantly decreases the inference time of the models. In this work, we propose MeshFeat, a parametric feature encoding tailored to meshes, for which we adapt the idea of multi-resolution feature grids from Euclidean space. We start from the structure provided by the given vertex topology and use a mesh simplification algorithm to construct a multi-resolution feature representation directly on the mesh. The approach allows the usage of small MLPs for neural fields on meshes, and we show a significant speed-up compared to previous representations while maintaining comparable reconstruction quality for texture reconstruction and BRDF representation. Given its intrinsic coupling to the vertices, the method is particularly well-suited for representations on deforming meshes, making it a good fit for object animation.

MCML Authors

Florian Hofherr

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Computer Vision & Artificial Intelligence

[1228]

Y. Mansour, X. Zhong, S. Caglar and R. Heckel.
TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Neural networks trained end-to-end give state-of-the-art performance for image denoising. However, when applied to an image outside of the training distribution, the performance often degrades significantly. In this work, we propose a test-time training (TTT) method based on masked image modeling (MIM) to improve denoising performance for out-of-distribution images. The method, termed TTT-MIM, consists of a training stage and a test time adaptation stage. At training, we minimize a standard supervised loss and a self-supervised loss aimed at reconstructing masked image patches. At test-time, we minimize a self-supervised loss to fine-tune the network to adapt to a single noisy image. Experiments show that our method can improve performance under natural distribution shifts, in particular it adapts well to real-world camera and microscope noise. A competitor to our method of training and finetuning is to use a zero-shot denoiser that does not rely on training data. However, compared to state-of-the-art zero-shot denoisers, our method shows superior performance, and is much faster, suggesting that training and finetuning on the test instance is a more efficient approach to image denoising than zero-shot methods in setups where little to no data is available.

MCML Authors

Youssef Mansour

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

[1227]

P. Müller, G. Kaissis and D. Rückert.
ChEX: Interactive Localization and Region Description in Chest X-rays.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX’s interactive capabilities.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1226]

K. R. Park, H. J. Lee and J. U. Kim.
Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Recent Audio-Visual Question Answering (AVQA) methods rely on complete visual and audio input to answer questions accurately. However, in real-world scenarios, issues such as device malfunctions and data transmission errors frequently result in missing audio or visual modality. In such cases, existing AVQA methods suffer significant performance degradation. In this paper, we propose a framework that ensures robust AVQA performance even when a modality is missing. First, we propose a Relation-aware Missing Modal (RMM) generator with Relation-aware Missing Modal Recalling (RMMR) loss to enhance the ability of the generator to recall missing modal information by understanding the relationships and context among the available modalities. Second, we design an Audio-Visual Relation-aware (AVR) diffusion model with Audio-Visual Enhancing (AVE) loss to further enhance audio-visual features by leveraging the relationships and shared cues between the audio-visual modalities. As a result, our method can provide accurate answers by effectively utilizing available information even when input modalities are missing. We believe our method holds potential applications not only in AVQA research but also in various multi-modal scenarios.

MCML Authors

Hong Joo Lee

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1225]

N. Stracke, S. A. Baumann, J. M. Susskind, M. A. Bautista and B. Ommer.
CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control and Altering of T2I Models.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images. However, guiding the generative process of these models to take into account detailed forms of conditioning reflecting style and/or structure information remains an open problem. In this paper, we present. LoRAdapter, an approach that unifies both style and structure conditioning under the same formulation using a novel conditional LoRA block that enables zero-shot control. LoRAdapter is an efficient and powerful approach to condition text-to-image diffusion models, which enables fine-grained control conditioning during generation and outperforms recent state-of-the-art approaches.

MCML Authors

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1224]

S. R. Vutukur, R. L. Haugaard, J. Huang, B. Busam and T. Birdal.
Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

Object pose distribution estimation is crucial in robotics for better path planning and handling of symmetric objects. Recent distribution estimation approaches employ contrastive learning-based approaches by maximizing the likelihood of a single pose estimate in the absence of a CAD model. We propose a pose distribution estimation method leveraging symmetry respecting correspondence distributions and shape information obtained using a CAD model. Contrastive learning-based approaches require an exhaustive amount of training images from different viewpoints to learn the distribution properly, which is not possible in realistic scenarios. Instead, we propose a pipeline that can leverage correspondence distributions and shape information from the CAD model, which are later used to learn pose distributions. Besides, having access to pose distribution based on correspondences before learning pose distributions conditioned on images, can help formulate the loss between distributions. The prior knowledge of distribution also helps the network to focus on getting sharper modes instead. With the CAD prior, our approach converges much faster and learns distribution better by focusing on learning sharper distribution near all the valid modes, unlike contrastive approaches, which focus on a single mode at a time. We achieve benchmark results on SYMSOL-I and T-Less datasets.

MCML Authors

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Computer Aided Medical Procedures & Augmented Reality

[1223]

Y. Wang, C. M. Albrecht, N. A. A. Braham, C. Liu, Z. Xiong and X. Zhu.
Decoupling Common and Unique Representations for Multimodal Self-supervised Learning.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

The increasing availability of multi-sensor data sparks wide interest in multimodal self-supervised learning. However, most existing approaches learn only common representations across modalities while ignoring intra-modal training and modality-unique representations. We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning. By distinguishing inter- and intra-modal embeddings through multimodal redundancy reduction, DeCUR can integrate complementary information across different modalities. We evaluate DeCUR in three common multimodal scenarios (radar-optical, RGB-elevation, and RGB-depth), and demonstrate its consistent improvement regardless of architectures and for both multimodal and modality-missing settings. With thorough experiments and comprehensive analysis, we hope this work can provide valuable insights and raise more interest in researching the hidden relationships of multimodal representations.

MCML Authors

Chenying Liu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1222]

S. Weber, J. H. Hong and D. Cremers.
Power Variable Projection for Initialization-Free Large-Scale Bundle Adjustment.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

Most Bundle Adjustment (BA) solvers like the Levenberg-Marquardt algorithm require a good initialization. Instead, initialization-free BA remains a largely uncharted territory. The under-explored Variable Projection algorithm (VarPro) exhibits a wide convergence basin even without initialization. Coupled with object space error formulation, recent works have shown its ability to solve small-scale initialization-free bundle adjustment problem. To make such initialization-free BA approaches scalable, we introduce Power Variable Projection (PoVar), extending a recent inverse expansion method based on power series. Importantly, we link the power series expansion to Riemannian manifold optimization. This projective framework is crucial to solve large-scale bundle adjustment problems without initialization. Using the real-world BAL dataset, we experimentally demonstrate that our solver achieves state-of-the-art results in terms of speed and accuracy. To our knowledge, this work is the first to address the scalability of BA without initialization opening new venues for initialization-free structure-from-motion.

MCML Authors

Simon Weber

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1221]

Q. Wu, Y. Xia, J. Wan and A. B. Chan.
Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

3D single object tracking (SOT) is an essential task in autonomous driving and robotics. However, learning robust 3D SOT trackers remains challenging due to the limited category-specific point cloud data and the inherent sparsity and incompleteness of LiDAR scans. To tackle these issues, we propose a unified 3D SOT framework that leverages 3D generative pre-training and learns robust 3D matching abilities from 2D pre-trained foundation trackers. Our framework features a consistent target-matching architecture with the widely used 2D trackers, facilitating the transfer of 2D matching knowledge. Specifically, we first propose a lightweight Target-Aware Projection (TAP) module, allowing the pre-trained 2D tracker to work well on the projected point clouds without further fine-tuning. We then propose a novel IoU-guided matching-distillation framework that utilizes the powerful 2D pre-trained trackers to guide 3D matching learning in the 3D tracker, i.e., the 3D template-to-search matching should be consistent with its corresponding 2D template-to-search matching obtained from 2D pre-trained trackers. Our designs are applied to two mainstream 3D SOT frameworks: memory-less Siamese and contextual memory-based approaches, which are respectively named SiamDisst and MemDisst. Extensive experiments show that SiamDisst and MemDisst achieve state-of-the-art performance on KITTI, Waymo Open Dataset and nuScenes benchmarks, while running at above real-time speed of 25 and 90 FPS on a RTX3090 GPU.

MCML Authors

Yan Xia

Dr.

* Former Member

[1220]

L. Yang, L. Hoyer, M. Weber, T. Fischer, D. Dai, L. Leal-Taixé, D. Cremers, M. Pollefeys and L. Van Gool.
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Unsupervised Domain Adaptation (UDA) is the task of bridging the domain gap between a labeled source domain, e.g., synthetic data, and an unlabeled target domain. We observe that current UDA methods show inferior results on fine structures and tend to oversegment objects with ambiguous appearance. To address these shortcomings, we propose to leverage geometric information, i.e., depth predictions, as depth discontinuities often coincide with segmentation boundaries. We show that naively incorporating depth into current UDA methods does not fully exploit the potential of this complementary information. To this end, we present MICDrop, which learns a joint feature representation by masking image encoder features while inversely masking depth encoder features. With this simple yet effective complementary masking strategy, we enforce the use of both modalities when learning the joint feature representation. To aid this process, we propose a feature fusion module to improve both global as well as local information sharing while being robust to errors in the depth predictions. We show that our method can be plugged into various recent UDA methods and consistently improve results across standard UDA benchmarks, obtaining new state-of-the-art performances.

MCML Authors

Mark Weber

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1219]

G. Zhai, E. P. Örnek, D. Z. Chen, R. Liao, Y. Di, N. Navab, F. Tombari and B. Busam.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Ruotong Liao

Database Systems and Data Mining

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1218]

J. S. Fischer, M. Gui, P. Ma, N. Stracke, S. A. Baumann and B. Ommer.
FMBoost: Boosting Latent Diffusion with Flow Matching.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. Oral Presentation. DOI

Abstract

Visual synthesis has recently seen significant leaps in performance, largely due to breakthroughs in generative models. Diffusion models have been a key enabler, as they excel in image diversity. However, this comes at the cost of slow training and synthesis, which is only partially alleviated by latent diffusion. To this end, flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis. We demonstrate our FMBoost approach, which introduces flow matching between a frozen diffusion model and a convolutional decoder that enables high-resolution image synthesis at reduced computational cost and model size. A small diffusion model can then effectively provide the necessary visual diversity, while flow matching efficiently enhances resolution and detail by mapping the small to a high-dimensional latent space, producing high-resolution images. Combining the diversity of diffusion models, the efficiency of flow matching, and the effectiveness of convolutional decoders, state-of-the-art high-resolution image synthesis is achieved at 10242 pixels with minimal computational cost. Cascading FMBoost optionally boosts this further to 20482 pixels. Importantly, this approach is orthogonal to recent approximation and speed-up strategies for the underlying model, making it easily integrable into the various diffusion model frameworks.

MCML Authors

Pingchuan Ma

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1217]

F. Stilz, M. Karaoglu, F. Tristram, N. Navab, B. Busam and A. Ladikos.
Progressive Optimization of Camera Pose and 4D Radiance Fields for long Endoscopic Videos.
NeuralBCC @ECCV 2024 - 1st Workshop on Neural Fields Beyond Conventional Cameras at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. PDF

Abstract

Reconstructing endoscopic scenes is vital for medical purposes, such as post-operative assessments and educational training. Recently, neural rendering has emerged as a promising method for reconstructing endoscopic scenes involving tissue deformation. Yet, current techniques exhibit major limitations, such as reliance on static endoscopes, limited deformation, or the need for external tracking devices to obtain camera pose data. In this paper we introduce a novel solution that can tackle these challenges posed by
a moving stereo endoscope in a highly deformable setting. Our method divides the scene into multiple overlapping 4D neural radiance fields (NeRFs) and uses a progressive optimization approach via optical flow and geometry supervision for simultaneous reconstruction and camera pose estimation. Tested on videos of up to fifteen times longer than what prior work experiment on, our method greatly improves usability, extending detailed reconstruction to much longer surgical videos without external tracking. Comprehensive evaluations using the StereoMIS dataset show that our method substantially enhances novel view synthesis quality while maintaining competitive pose accuracy.

MCML Authors

Felix Tristram

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1216]

M. Wysocki, M. F. Azampour, F. Tristram, B. Busam and N. Navab.
Beyond Ultra-NeRF: Explainable Neural Fields for Ultrasound.
NeuralBCC @ECCV 2024 - 1st Workshop on Neural Fields Beyond Conventional Cameras at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. URL

Abstract

Current ultrasound image synthesis techniques often fall short in semantic accuracy and physical realism or produce images with a significant domain gap. Ultra-NeRF addresses these issues by creating a Neural Field from reconstructed acoustic properties via pose-annotated B-mode images and shows that it can be used for novel view synthesis of B-mode images. While Ultra-NeRF generates plausible results, it lacks explainability in the acoustic parameter space. In this paper, we revisit neural fields for ultrasound and introduce the Sonographic Neural Reflection Field (SuRF), which adheres to the physical properties of acoustic ultrasound. By redesigning Ultra-NeRF’s differentiable forward synthesis model and incorporating physics-inspired regularizations, we ensure the interpretability of learned acoustic parameters. Tested on the Ultra-NeRF in-silico dataset and a new multi-view ex-vivo 3D ultrasound dataset, our method demonstrates enhanced reconstruction and interpretation across various tissue types, including fat, muscle, and bone.

MCML Authors

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Felix Tristram

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1215]

D. Komorowicz, L. Sang, F. Maiwald and D. Cremers.
Coloring the Past: Neural Historical Monuments Reconstruction from Archival Photography.
Wild3D @ECCV 2024 - Workshop 3D Modeling, Reconstruction, and Generation in the Wild at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. URL

Abstract

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1214]

L. Sang, M. Gao, A. Saroha and D. Cremers.
Enhancing Surface Neural Implicits with Curvature-Guided Sampling and Uncertainty-Augmented Representations.
Wild3D @ECCV 2024 - Workshop 3D Modeling, Reconstruction, and Generation in the Wild at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. URL

Abstract

Neural implicits are a widely used surface presentation because they offer an adaptive resolution and support arbitrary topology changes. While previous works rely on ground truth point clouds or meshes, they often do not discuss the data acquisition and ignore the effect of input quality and sampling methods during reconstruction. In this paper, we introduce a sampling method with an uncertainty-augmented surface implicit representation that employs a sampling technique that considers the geometric characteristics of inputs. To this end, we introduce a strategy that efficiently computes differentiable geometric features, namely, mean curvatures, to guide the sampling phase during the training period. The uncertainty augmentation offers insights into the occupancy and reliability of the output signed distance value, thereby expanding representation capabilities into open surfaces. Finally, we demonstrate that our method improves the reconstruction of both synthetic and real-world data.

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Maolin Gao

Computer Vision & Artificial Intelligence

Abhishek Saroha

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1213]

P. Jahoda, Y. Yeganeh, E. Adeli, N. Navab and A. Farshad.
PRISM: Progressive Restoration for Scene Graph-Based Image Manipulation.
Workshop @ECCV 2024 - Computer Vision Workshop at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

Scene graphs have emerged as accurate semantic descriptions for image generation and manipulation tasks; however, their complexity and diversity of the shapes and relations of objects in data make it challenging to incorporate them into the models and generate high-quality results. To address these challenges, we propose PRISM, a novel progressive multi-head image manipulation approach to improve the accuracy of the manipulation of target regions in the scene. Our image manipulation framework is trained using an end-to-end denoising masked reconstruction proxy task, where the masked regions are progressively unmasked from the outer regions to the inner part. We take advantage of the outer part of the masked area as they have a direct correlation with the context of the scene. Moreover, our multi-head architecture simultaneously generates detailed object-specific regions in addition to the entire image to produce higher-quality images. Our model is evaluated against methods in the semantic image manipulation task on the CLEVR and Visual Genome datasets. Our results demonstrate the potential of our approach for enhancing the quality and precision of scene graph-based image manipulation.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1212]

S. R. Vutukur, M. Ba, B. Busam, M. Kayser and G. Singh.
SABER-6D: Shape Representation Based Implicit Object Pose Estimation.
Workshop @ECCV 2024 - Computer Vision Workshop at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

In this paper, we propose a novel encoder-decoder architecture, named SABER, to learn the 6D pose of the object in the embedding space by learning shape representation at a given pose. This model enables us to learn pose by performing shape representation at a target pose from RGB image input. We perform shape representation as an auxiliary task which helps us in learning rotations space for an object based on 2D images. An image encoder predicts the rotation in the embedding space and the DeepSDF based decoder learns to represent the object’s shape at the given pose. As our approach is shape based, the pipeline is suitable for any type of object irrespective of the symmetry. Moreover, we need only a CAD model of the objects to train SABER. Our pipeline is synthetic data based and can also handle symmetric objects without symmetry labels and, thus, no additional labeled training data is needed. The experimental evaluation shows that our method achieves close to benchmark results for both symmetric objects and asymmetric objects on Occlusion-LineMOD, and T-LESS datasets.

MCML Authors

Benjamin Busam

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Computer Aided Medical Procedures & Augmented Reality

[1211]

K. Riedl.
Mathematical Foundations of Interacting Multi-Particle Systems for Optimization.
Dissertation 2024. URL

Abstract

This dissertation lays mathematical foundations for the numerical analysis of interacting multi-particle systems in the setting of optimization. While such systems are of paramount importance in and beyond applied mathematics, their rigorous analysis largely remained elusive. Given the necessity for capable, reliable, and robust algorithms with informative and solid convergence guarantees, we provide an analytical framework that builds upon insights obtained by taking a mean-field perspective.

MCML Authors

Konstantin Riedl

Dr.

* Former Member

[1210]

F. Hoppe, C. M. Verdun, F. Krahmer, M. I. Menzel and H. Rauhut.
With or Without Replacement? Improving Confidence in Fourier Imaging.
CoSeRa 2024 - International Workshop on the Theory of Computational Sensing and its Applications to Radar, Multimodal Sensing and Imaging. Santiago de Compostela, Spain, Sep 18-20, 2024. DOI

Abstract

Over the last few years, debiased estimators have been proposed in order to establish rigorous confidence intervals for high-dimensional problems in machine learning and data science. The core argument is that the error of these estimators with respect to the ground truth can be expressed as a Gaussian variable plus a remainder term that vanishes as long as the dimension of the problem is sufficiently high. Thus, uncertainty quantification (UQ) can be performed exploiting the Gaussian model. Empirically, however, the remainder term cannot be neglected in many realistic situations of moderately-sized dimensions, in particular in certain structured measurement scenarios such as Magnetic Resonance Imaging (MRI). This, in turn, can downgrade the advantage of the UQ methods as compared to non-UQ approaches such as the standard LASSO. In this paper, we present a method to improve the debiased estimator by sampling without replacement. Our approach leverages recent results of ours on the structure of the random nature of certain sampling schemes showing how a transition between sampling with and without replacement can lead to a weighted reconstruction scheme with improved performance for the standard LASSO. In this paper, we illustrate how this reweighted sampling idea can also improve the debiased estimator and, consequently, provide a better method for UQ in Fourier imaging.

MCML Authors

Claudio Mayrink Verdun

Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

* Former Member

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[1209]

F. P. Patricio, P. Catala and F. Krahmer.
Noisy Recovery in Unlimited Sampling via Adaptive Modulo Representations.
CoSeRa 2024 - International Workshop on the Theory of Computational Sensing and its Applications to Radar, Multimodal Sensing and Imaging. Santiago de Compostela, Spain, Sep 18-20, 2024. DOI

Abstract

Recent works put forth the Unlimited Sensing Framework (USF), a novel approach to analog-to-digital conversion for high dynamic range sensing. It addresses the saturation phenomenon that commonly arises when physical measurements exceed the dynamic range of a sensor, yielding permanent loss of the input data. However, the USF still has some limitations when dealing with random noise. In the present paper, we propose a novel iterative method to tackle unlimited sensing in a noisy setting. In one step, our approach applies local transformations of the range to remove strong artifacts caused by the noise on local subdivisions of the domain. In the following step, the signal is then approximated via a least squares method. These two types of steps are then alternated. We illustrate the performances of our algorithm in high noise regime.

MCML Authors

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[1208]

P. Römer and F. Krahmer.
A one-bit quantization approach for low-dose Poisson phase retrieval.
CoSeRa 2024 - International Workshop on the Theory of Computational Sensing and its Applications to Radar, Multimodal Sensing and Imaging. Santiago de Compostela, Spain, Sep 18-20, 2024. DOI

Abstract

Imaging quality for biological tissue is commonly affected by damages of the specimen caused by illumination particles. To mitigate this issue, often very low doses of illumination have to be used in the experiment. Consequently, the resulting inverse problem is subject to highly noisy data. In this note, we address this issue for the case of diffraction imaging by studying the problem of phase retrieval with low-count Poisson data. Our key idea is to exploit the close connection between the Poisson measurement model and the one-bit quantization problem. We propose a reconstruction method based on algorithmic approaches to that problem and compare the performance of this method with state-of-the-art algorithms for noisy phase retrieval, observing superior performance in a number of relevant examples.

MCML Authors

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[1207]

M. M. Amin and B. W. Schuller.
On Prompt Sensitivity of ChatGPT in Affective Computing.
ACII 2024 - 12th International Conference on Affective Computing and Intelligent Interaction. Glasgow, UK, Sep 15-18, 2024. DOI

Abstract

Recent studies have demonstrated the emerging capabilities of foundation models like ChatGPT in several fields, including affective computing. However, accessing these emerging capabilities is facilitated through prompt engineering. Despite the existence of some prompting techniques, the field is still rapidly evolving and many prompting ideas still require investigation. In this work, we introduce a method to evaluate and investigate the sensitivity of the performance of foundation models based on different prompts or generation parameters. We perform our evaluation on ChatGPT within the scope of affective computing on three major problems, namely sentiment analysis, toxicity detection, and sarcasm detection. First, we carry out a sensitivity analysis on pivotal parameters in auto-regressive text generation, specifically the temperature parameter T and the top-p parameter in Nucleus sampling, dictating how conservative or creative the model should be during generation. Furthermore, we explore the efficacy of several prompting ideas, where we explore how giving different incentives or structures affect the performance. Our evaluation takes into consideration performance measures on the affective computing tasks, and the effectiveness of the model to follow the stated instructions, hence generating easy-to-parse responses to be smoothly used in downstream applications.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1206]

Y. Liang, O. Zadorozhnyi and M. Drton.
Kernel-Based Differentiable Learning of Non-Parametric Directed Acyclic Graphical Models.
PGM 2024 - 12th International Conference on Probabilistic Graphical Models. Nijmegen, The Netherlands, Sep 11-13, 2024. URL

Abstract

Causal discovery amounts to learning a directed acyclic graph (DAG) that encodes a causal model. This model selection problem can be challenging due to its large combinatorial search space, particularly when dealing with non-parametric causal models. Recent research has sought to bypass the combinatorial search by reformulating causal discovery as a continuous optimization problem, employing constraints that ensure the acyclicity of the graph. In non-parametric settings, existing approaches typically rely on finite-dimensional approximations of the relationships between nodes, resulting in a score-based continuous optimization problem with a smooth acyclicity constraint. In this work, we develop an alternative approximation method by utilizing reproducing kernel Hilbert spaces (RKHS) and applying general sparsity-inducing regularization terms based on partial derivatives. Within this framework, we introduce an extended RKHS representer theorem. To enforce acyclicity, we advocate the log-determinant formulation of the acyclicity constraint and show its stability. Finally, we assess the performance of our proposed RKHS-DAGMA procedure through simulations and illustrative data analyses.

MCML Authors

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

[1205]

D. Strieder and M. Drton.
Identifying Total Causal Effects in Linear Models under Partial Homoscedasticity.
PGM 2024 - 12th International Conference on Probabilistic Graphical Models. Nijmegen, The Netherlands, Sep 11-13, 2024. URL

Abstract

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

Mathematical Statistics

[1204]

M. C. da Silva, B. Licari, G. M. Tavares and S. Barbon Junior.
Benchmarking AutoML Clustering Frameworks.
AutoML 2024 - ABCD Track - International Conference on Automated Machine Learning. Paris, France, Sep 09-12, 2024. URL

Abstract

The surge of frameworks for automated unsupervised clustering problems exposed the notable gap in performance assessment, unified datasets and methodologies for this field. The lack of standardization and proper clustering goal setting obscures the applicability and suitability of such solutions. Therefore, we propose a benchmark to bridge this gap by offering a comparative analysis of AutoML frameworks for clustering, using several criteria and a comprehensive set of benchmarking problems. Four prominent AutoML unsupervised frameworks (AutoML4Clust, Autocluster, cSmartML, and ML2DAC) were compared following our methodology. By extending the evaluation beyond quantitative metrics, this research contributes to a more nuanced understanding of the applicability and performance of AutoML for a diverse set of clustering problems. Our analysis shows the evident demand for effort in the direction of pipeline synthesis (i.e., search and optimization of complete pipelines), clustering goal definition and suitable analysis dimensions.

MCML Authors

Gabriel Marques Tavares

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[1203]

H. Baniecki, G. Casalicchio, B. Bischl and P. Biecek.
On the Robustness of Global Feature Effect Explanations.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.

MCML Authors

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[1202]

C. Damke and E. Hüllermeier.
CUQ-GNN: Committee-Based Graph Uncertainty Quantification Using Posterior Networks.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

In this work, we study the influence of domain-specific characteristics when defining a meaningful notion of predictive uncertainty on graph data. Previously, the so-called Graph Posterior Network (GPN) model has been proposed to quantify uncertainty in node classification tasks. Given a graph, it uses Normalizing Flows (NFs) to estimate class densities for each node independently and converts those densities into Dirichlet pseudo-counts, which are then dispersed through the graph using the personalized Page-Rank (PPR) algorithm. The architecture of GPNs is motivated by a set of three axioms on the properties of its uncertainty estimates. We show that those axioms are not always satisfied in practice and therefore propose the family of Committe-based Uncertainty Quantification Graph Neural Networks (CUQ-GNNs), which combine standard Graph Neural Networks (GNNs) with the NF-based uncertainty estimation of Posterior Networks (PostNets). This approach adapts more flexibly to domain-specific demands on the properties of uncertainty estimates. We compare CUQ-GNN against GPN and other uncertainty quantification approaches on common node classification benchmarks and show that it is effective at producing useful uncertainty estimates.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1201]

R. Fischer, M. Wever, S. Buschjäger and T. Liebig.
MetaQuRe: Meta-learning from Model Quality and Resource Consumption.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

Automated machine learning (AutoML) allows for selecting, parametrizing, and composing learning algorithms for a given data set. While resources play a pivotal role in neural architecture search, it is less pronounced by classical AutoML approaches. In fact, they generally focus on only maximizing predictive quality and disregard the importance of finding resource-efficient solutions. To push resource awareness further, our work explicitly explores how measures such as running time or energy consumption can be better considered in AutoML. Firstly, we propose a novel method for algorithm selection that balances multiple performance aspects (including resource demand) as prioritized by the user with the help of compositional meta-learning. Secondly, to foster research on green meta-learning and AutoML, we release the MetaQuRe data set, which contains information on predictive (Qu)ality and (Re)source consumption of models evaluated across hundreds of data sets and four execution environments. We use this data to put our methodology into practice and conduct an in-depth analysis of how our approach and data set can help in making AutoML more resource-aware, which represents our third contribution. Lastly, we publish MetaQuRe alongside an extensive code base, allowing for reproducing all results, expanding our data with results from custom environments, and exploring MetaQuRe interactively. In short, our work demonstrates both the importance as well as benefits of rethinking AutoML and meta-learning in a resource-aware way, thus paving the path for making future ML solutions more sustainable.

MCML Authors

Marcel Wever

Dr.

* Former Member

[1200]

S. Gilhuber, A. Beer, Y. Ma and T. Seidl.
FALCUN: A Simple and Efficient Deep Active Learning Strategy.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

We propose FALCUN, a novel deep batch active learning method that is label- and time-efficient. Our proposed acquisition uses a natural, self-adjusting balance of uncertainty and diversity: It slowly transitions from emphasizing uncertain instances at the decision boundary to emphasizing batch diversity. In contrast, established deep active learning methods often have a fixed weighting of uncertainty and diversity, limiting their effectiveness over diverse data sets exhibiting different characteristics. Moreover, to increase diversity, most methods demand intensive search through a deep neural network’s high-dimensional latent embedding space. This leads to high acquisition times when experts are idle while waiting for the next batch for annotation. We overcome this structural problem by exclusively operating on the low-dimensional probability space, yielding much faster acquisition times without sacrificing label efficiency. In extensive experiments, we show FALCUN’s suitability for diverse use cases, including medical images and tabular data. Compared to state-of-the-art methods like BADGE, CLUE, and AlfaMix, FALCUN consistently excels in quality and speed: while FALCUN is among the fastest methods, it has the highest average label efficiency.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[1199]

P. Jahn, C. M. M. Frey, A. Beer, C. Leiber and T. Seidl.
Data with Density-Based Clusters: A Generator for Systematic Evaluation of Clustering Algorithms.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI GitHub

Abstract

Mining data containing density-based clusters is well-established and widespread but faces problems when it comes to systematic and reproducible comparison and evaluation. Although the success of clustering methods hinges on data quality and availability, reproducibly generating suitable data for this setting is not easy, leading to mostly low-dimensional toy datasets being used. To resolve this issue, we propose DENSIRED (DENSIty-based Reproducible Experimental Data), a novel data generator for data containing density-based clusters. It is highly flexible w.r.t. a large variety of properties of the data and produces reproducible datasets in a two-step approach. First, skeletons of the clusters are constructed following a random walk. In the second step, these skeletons are enriched with data samples. DENSIRED enables the systematic generation of data for a robust and reliable analysis of methods aimed toward examining data containing density-connected clusters. In extensive experiments, we analyze the impact of user-defined properties on the generated datasets and the intrinsic dimensionalities of synthesized clusters.

MCML Authors

Philipp Jahn

Database Systems and Data Mining

Collin Leiber

* Former Member

Thomas Seidl

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining

[1198]

Y. Liu, E. Nie, S. Feng, Z. Hua, Z. Ding, D. Wang, Y. Zhang and H. Schütze.
A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI GitHub

Abstract

Current state-of-the-art dialogue systems heavily rely on extensive training datasets. However, challenges arise in domains where domain-specific training datasets are insufficient or entirely absent. To tackle this challenge, we propose a novel data Augmentation framework for Multi-Domain Dialogue Generation, referred to as AMDG. The AMDG framework consists of a data augmentation process and a two-stage training approach: domain-agnostic training and domain adaptation training. We posit that domain corpora are a blend of domain-agnostic and domain-specific features, with certain representation patterns shared among diverse domains. Domain-agnostic training aims to enable models to learn these common expressive patterns. To construct domain-agnostic dialogue corpora, we employ a de-domaining data processing technique used to remove domain-specific features. By mitigating the effects of domain-specific features, the model trained on the de-domained corpora can effectively learn common expression patterns in different domains. Subsequently, we adapt the learned domain-agnostic features to the target domain through domain adaptation training. We conduct experiments on Chinese dialogue datasets from five different domains and show that AMDG achieves superior performance compared to both direct training on the target domain corpus and collective training on all five domain corpora. Our work underscores AMDG as a viable alternative solution for low-resource multi-domain dialogue generation.

MCML Authors

Yongkang Liu

* Former Member

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Zifeng Ding

Database Systems and Data Mining

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1197]

F. Stermann, I. Chalkidis, A. Vahidi, B. Bischl and M. Rezaei.
Attention-Driven Dropout: A Simple Method to Improve Self-supervised Contrastive Sentence Embeddings.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

Self-contrastive learning has proven effective for vision and natural language tasks. It aims to learn aligned data representations by encoding similar and dissimilar sentence pairs without human annotation. Therefore, data augmentation plays a crucial role in the learned embedding quality. However, in natural language processing (NLP), creating augmented samples for unsupervised contrastive learning is challenging since random editing may modify the semantic meanings of sentences and thus affect learning good representations. In this paper, we introduce a simple, still effective approach dubbed ADD (Attention-Driven Dropout) to generate better-augmented views of sentences to be used in self-contrastive learning. Given a sentence and a Pre-trained Transformer Language Model (PLM), such as RoBERTa, we use the aggregated attention scores of the PLM to remove the less “informative” tokens from the input. We consider two alternative algorithms based on NAIVEAGGREGATION across layers/heads and ATTENTIONROLLOUT [1]. Our approach significantly improves the overall performance of various self-supervised contrastive-based methods, including SIMCSE [14], DIFFCSE [10], and INFOCSE [33] by facilitating the generation of high-quality positive pairs required by these methods. Through empirical evaluations on multiple Semantic Textual Similarity (STS) and Transfer Learning tasks, we observe enhanced performance across the board.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[1196]

A. Vahidi, L. Wimmer, H. A. Gündüz, B. Bischl, E. Hüllermeier and M. Rezaei.
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory demands. In addition, the efficiency of a deep ensemble is related to diversity among the ensemble members, which is challenging for large, over-parameterized deep neural networks. Moreover, ensemble learning has not yet seen such widespread adoption for unsupervised learning and it remains a challenging endeavor for self-supervised or unsupervised representation learning. Motivated by these challenges, we present a novel self-supervised training regime that leverages an ensemble of independent sub-networks, complemented by a new loss function designed to encourage diversity. Our method efficiently builds a sub-model ensemble with high diversity, leading to well-calibrated estimates of model uncertainty, all achieved with minimal computational overhead compared to traditional deep self-supervised ensembles. To evaluate the effectiveness of our approach, we conducted extensive experiments across various tasks, including in-distribution generalization, out-of-distribution detection, dataset corruption, and semi-supervised settings. The results demonstrate that our method significantly improves prediction reliability. Our approach not only achieves excellent accuracy but also enhances calibration, improving on important baseline performance across a wide range of self-supervised architectures in computer vision, natural language processing, and genomics data.

MCML Authors

Lisa Wimmer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Hüseyin Anil Gündüz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Mina Rezaei

Dr.

Statistical Learning and Data Science

[1195]

B. Chocholaty, C. Leiber and S. Marburg.
Effects of similarity measures and assignment methods on mode pairing for the application of timber plates.
ISMA 2024 - 31st International Conference on Noise and Vibration Engineering. KU Leuven, Belgium, Sep 09-11, 2024. URL

Abstract

Correctly pairing experimentally and numerically determined mode shapes is crucial for successful model updating. It ensures that the updated model accurately reflects the physical behavior of the structure. This study investigates the two main steps applied for successful mode pairing. First, the correlation between the model and experiments is analyzed using different measures of similarity. Second, based on the computed correlation, a variety of strategies for a correct assignment of the mode pairs is studied. Here, an approach to iteratively combine the mode pairs showing the maximum similarity value in the similarity matrix, an extension additionally using the auto-similarity matrix, the Hungarian method, and a clustering-based approach are investigated. To study the efficacy of the various approaches, the study incorporates an application involving a timber plate. Thus, the effects of employing different similarity measures and pair assignment methods are demonstrated, providing insights for future studies related to mode pairing and model updating.

MCML Authors

Collin Leiber

* Former Member

[1194]

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
Explaining Change in Models and Data with Global Feature Importance and Effects.
TempXAI @ECML-PKDD 2024 - Tutorial-Workshop Explainable AI for Time Series and Data Streams at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. PDF

Abstract

In dynamic machine learning environments, where data streams continuously evolve, traditional explanation methods struggle to remain faithful to the underlying model or data distribution. Therefore, this work presents a unified framework for efficiently computing incremental model-agnostic global explanations tailored for time-dependent models. By extending static model-agnostic methods such as Permutation Feature Importance, SAGE, and Partial Dependence Plots into the online learning context, the proposed framework enables the continuous updating of explanations as new data becomes available. These incremental variants ensure that global explanations remain relevant while minimizing computational overhead. The framework also addresses key challenges related to data distribution maintenance and perturbation generation in online learning, offering time and memory efficient solutions like geometric reservoir-based sampling for data replacement.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Artificial Intelligence and Machine Learning

[1193]

C. Geldhauser and K. Malyshev.
Semi-automatic annotation of Greek majuscule manuscripts: Steps towards integrated transcription and annotation.
FedCSIS 2024 - 19th Conference on Computer Science and Intelligence Systems. Belgrade, Serbia, Sep 08-11, 2024. DOI

Abstract

We present a prototype for the integration of HTR transcription and semi-automated markup of textual features in the eScriptorium GUI.

MCML Authors

Carina Geldhauser

Dr.

* Former Member

[1192]

L. Haliburton, J. Leusmann, R. Welsch, S. Ghebremedhin, P. Isaakidis, A. Schmidt and S. Mayer.
Uncovering labeler bias in machine learning annotation tasks.
AI and Ethics (Sep. 2024). DOI

Abstract

As artificial intelligence becomes increasingly pervasive, it is essential that we understand the implications of bias in machine learning. Many developers rely on crowd workers to generate and annotate datasets for machine learning applications. However, this step risks embedding training data with labeler bias, leading to biased decision-making in systems trained on these datasets. To characterize labeler bias, we created a face dataset and conducted two studies where labelers of different ethnicity and sex completed annotation tasks. In the first study, labelers annotated subjective characteristics of faces. In the second, they annotated images using bounding boxes. Our results demonstrate that labeler demographics significantly impact both subjective and accuracy-based annotations, indicating that collecting a diverse set of labelers may not be enough to solve the problem. We discuss the consequences of these findings for current machine learning practices to create fair and unbiased systems.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

* Former Member

[1191]

A. Maldonado, C. M. M. Frey, G. M. Tavares, N. Rehwald and T. Seidl.
GEDI: Generating Event Data with Intentional Features for Benchmarking Process Mining.
BPM 2024 - 22nd International Conference on Business Process Management. Krakow, Poland, Sep 01-06, 2024. DOI

Abstract

Process mining solutions include enhancing performance, conserving resources, and alleviating bottlenecks in organizational contexts. However, as in other data mining fields, success hinges on data quality and availability. Existing analyses for process mining solutions lack diverse and ample data for rigorous testing, hindering insights’ generalization. To address this, we propose Generating Event Data with Intentional features, a framework producing event data sets satisfying specific meta-features. Considering the meta-feature space that defines feasible event logs, we observe that existing real-world datasets describe only local areas within the overall space. Hence, our framework aims at providing the capability to generate an event data benchmark, which covers unexplored regions. Therefore, our approach leverages a discretization of the meta-feature space to steer generated data towards regions, where a combination of meta-features is not met yet by existing benchmark datasets. Providing a comprehensive data pool enriches process mining analyses, enables methods to capture a wider range of real-world scenarios, and improves evaluation quality. Moreover, it empowers analysts to uncover correlations between meta-features and evaluation metrics, enhancing explainability and solution effectiveness. Experiments demonstrate GEDI’s ability to produce a benchmark of intentional event data sets and robust analyses for process mining tasks.

MCML Authors

Andrea Maldonado

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Christian Frey

Dr.

* Former Member

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[1190]

R. S. Oyamada, G. M. Tavares, S. B. Junior and P. Ceravolo.
CoSMo: A Framework to Instantiate Conditioned Process Simulation Models.
BPM 2024 - 22nd International Conference on Business Process Management. Krakow, Poland, Sep 01-06, 2024. DOI

Abstract

Process simulation is gaining attention for its ability to assess potential performance improvements and risks associated with business process changes. The existing literature presents various techniques, generally grounded in process models discovered from event log data or built upon deep learning algorithms. These techniques have specific strengths and limitations. Traditional data-driven approaches offer increased interpretability, while deep learning-based excel at generalizing changes across large event logs. However, the practical application of deep learning faces challenges related to managing stochasticity and integrating information for what-if analysis. This paper introduces a novel recurrent neural architecture tailored to discover COnditioned process Simulation MOdels (CoSMo) based on user-based constraints or any other nature of a-priori knowledge. This architecture facilitates the simulation of event logs that adhere to specific constraints by incorporating declarative-based rules into the learning phase as an attempt to fill the gap of incorporating information into deep learning models to perform what-if analysis. Experimental validation illustrates CoSMo’s efficacy in simulating event logs while adhering to predefined declarative conditions, emphasizing both control-flow and data-flow perspectives.

MCML Authors

Gabriel Marques Tavares

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[1189]

C. Molnar, G. König, B. Bischl and G. Casalicchio.
Model-agnostic feature importance and effects with dependent features: a conditional subgroup approach.
Data Mining and Knowledge Discovery 38 (Sep. 2024). DOI

Abstract

The interpretation of feature importance in machine learning models is challenging when features are dependent. Permutation feature importance (PFI) ignores such dependencies, which can cause misleading interpretations due to extrapolation. A possible remedy is more advanced conditional PFI approaches that enable the assessment of feature importance conditional on all other features. Due to this shift in perspective and in order to enable correct interpretations, it is beneficial if the conditioning is transparent and comprehensible. In this paper, we propose a new sampling mechanism for the conditional distribution based on permutations in conditional subgroups. As these subgroups are constructed using tree-based methods such as transformation trees, the conditioning becomes inherently interpretable. This not only provides a simple and effective estimator of conditional PFI, but also local PFI estimates within the subgroups. In addition, we apply the conditional subgroups approach to partial dependence plots, a popular method for describing feature effects that can also suffer from extrapolation when features are dependent and interactions are present in the model. In simulations and a real-world application, we demonstrate the advantages of the conditional subgroup approach over existing methods: It allows to compute conditional PFI that is more true to the data than existing proposals and enables a fine-grained interpretation of feature effects and importance within the conditional subgroups.

MCML Authors

Gunnar König

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[1188]

L. Barreñada, P. Dhiman, D. Timmerman, A.-L. Boulesteix and B. Van Calster.
Understanding overfitting in random forest for probability estimation: a visualization and simulation study.
Diagnostic and Prognostic Research 8.14 (Sep. 2024). DOI

Abstract

Background: Random forests have become popular for clinical risk prediction modeling. In a case study on predicting ovarian malignancy, we observed training AUCs close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behavior of random forests for probability estimation by (1) visualizing data space in three real-world case studies and (2) a simulation study.
Methods: For the case studies, multinomial risk estimates were visualized using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data-generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true AUC, and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 with binary outcomes were simulated, and random forest models were trained with minimum node size 2 or 20 using the ranger R package, resulting in 192 scenarios in total. Model performance was evaluated on large test datasets (N = 100,000).
Results: The visualizations suggested that the model learned “spikes of probability” around events in the training set. A cluster of events created a bigger peak or plateau (signal), isolated events local peaks (noise). In the simulation study, median training AUCs were between 0.97 and 1 unless there were 4 binary predictors or 16 binary predictors with a minimum node size of 20. The median discrimination loss, i.e., the difference between the median test AUC and the true AUC, was 0.025 (range 0.00 to 0.13). Median training AUCs had Spearman correlations of around 0.70 with discrimination loss. Median test AUCs were higher with higher events per variable, higher minimum node size, and binary predictors. Median training calibration slopes were always above 1 and were not correlated with median test slopes across scenarios (Spearman correlation − 0.11). Median test slopes were higher with higher true AUC, higher minimum node size, and higher sample size.
Conclusions: Random forests learn local probability peaks that often yield near perfect training AUCs without strongly affecting AUCs on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[1187]

Y. Li, T. Herold, U. Mansmann and R. Hornung.
Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study.
Earth System Science Data 24.244 (Sep. 2024). DOI

Abstract

Background: Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions.
Methods: In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell’s C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives.
Results: Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures.
Conclusions: Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.

MCML Authors

Roman Hornung

Dr.

Biometry in Molecular Medicine

[1186]

W. Jiang, M. Windl, B. Tag, Z. Sarsenbayeva and S. Mayer.
An Immersive and Interactive VR Dataset to Elicit Emotions.
IEEE Transactions on Visualization and Computer Graphics 30.11 (Sep. 2024). DOI

Abstract

Images and videos are widely used to elicit emotions; however, their visual appeal differs from real-world experiences. With virtual reality becoming more realistic, immersive, and interactive, we envision virtual environments to elicit emotions effectively, rapidly, and with high ecological validity. This work presents the first interactive virtual reality dataset to elicit emotions. We created five interactive virtual environments based on corresponding validated 360° videos and validated their effectiveness with 160 participants. Our results show that our virtual environments successfully elicit targeted emotions. Compared with the existing methods using images or videos, our dataset allows virtual reality researchers and practitioners to integrate their designs effectively with emotion elicitation settings in an immersive and interactive way.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[1185]

S. Amiriparian, F. Packań, M. Gerczuk and B. W. Schuller.
ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets.
INTERSPEECH 2024 - 25th Annual Conference of the International Speech Communication Association. Kos Island, Greece, Sep 01-05, 2024. DOI

Abstract

Foundation models have shown great promise in speech emotion recognition (SER) by leveraging their pre-trained representations to capture emotion patterns in speech signals. To further enhance SER performance across various languages and domains, we propose a novel twofold approach. First, we gather EmoSet++, a comprehensive multi-lingual, multi-cultural speech emotion corpus with 37 datasets, 150,907 samples, and a total duration of 119.5 hours. Second, we introduce ExHuBERT, an enhanced version of HuBERT achieved by backbone extension and fine-tuning on EmoSet++. We duplicate each encoder layer and its weights, then freeze the first duplicate, integrating an extra zero-initialized linear layer and skip connections to preserve functionality and ensure its adaptability for subsequent fine-tuning. Our evaluation on unseen datasets shows the efficacy of ExHuBERT, setting a new benchmark for various SER tasks.

MCML Authors

Shahin Amiriparian

Dr.

Health Informatics

Filip Packań

Health Informatics

Maurice Gerczuk

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1184]

L. Christ, S. Amiriparian, F. Hawighorst, A.-K. Schill, A. Boutalikakis, L. Graf-Vlachy, A. König and B. W. Schuller.
This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach.
INTERSPEECH 2024 - 25th Annual Conference of the International Speech Communication Association. Kos Island, Greece, Sep 01-05, 2024. DOI

Abstract

Flattery is an important aspect of human communication that facilitates social bonding, shapes perceptions, and influences behavior through strategic compliments and praise, leveraging the power of speech to build rapport effectively. Its automatic detection can thus enhance the naturalness of human-AI interactions. To meet this need, we present a novel audio textual dataset comprising 20 hours of speech and train machine learning models for automatic flattery detection. In particular, we employ pretrained AST, Wav2Vec2, and Whisper models for the speech modality, and Whisper TTS models combined with a RoBERTa text classifier for the textual modality. Subsequently, we build a multimodal classifier by combining text and audio representations. Evaluation on unseen test data demonstrates promising results, with Unweighted Average Recall scores reaching 82.46% in audio-only experiments, 85.97% in text-only experiments, and 87.16% using a multimodal approach.

MCML Authors

Shahin Amiriparian

Dr.

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1183]

M. Gerczuk, S. Amiriparian, J. Lutz, W. Strube, I. Papazova, A. Hasan and B. W. Schuller.
Exploring Gender-Specific Speech Patterns in Automatic Suicide Risk Assessment.
INTERSPEECH 2024 - 25th Annual Conference of the International Speech Communication Association. Kos Island, Greece, Sep 01-05, 2024. DOI

Abstract

In emergency medicine, timely intervention for patients at risk of suicide is often hindered by delayed access to specialised psychiatric care. To bridge this gap, we introduce a speech-based approach for automatic suicide risk assessment. Our study involves a novel dataset comprising speech recordings of 20 patients who read neutral texts. We extract four speech representations encompassing interpretable and deep features. Further, we explore the impact of gender-based modelling and phrase-level normalisation. By applying gender-exclusive modelling, features extracted from an emotion fine-tuned wav2vec2.0 model can be utilised to discriminate high- from low-suicide risk with a balanced accuracy of 81%. Finally, our analysis reveals a discrepancy in the relationship of speech characteristics and suicide risk between female and male subjects. For men in our dataset, suicide risk increases together with agitation while voice characteristics of female subjects point the other way.

MCML Authors

Maurice Gerczuk

Health Informatics

Shahin Amiriparian

Dr.

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1182]

S. Kalabakov, M. Gonzalez-Machorro, F. Eyben, B. W. Schuller and B. Arnrich.
A Comparative Analysis of Federated Learning for Speech-Based Cognitive Decline Detection.
INTERSPEECH 2024 - 25th Annual Conference of the International Speech Communication Association. Kos Island, Greece, Sep 01-05, 2024. PDF

Abstract

Speech-based machine learning models that can distinguish between a healthy cognitive state and different stages of cognitive decline would enable a more appropriate and timely treatment of patients. However, their development is often hampered by data scarcity. Federated Learning (FL) is a potential solution that could enable entities with limited voice recordings to collectively build effective models. Motivated by this, we compare centralised, local, and federated learning for building speech-based models to discern Alzheimer’s Disease, Mild Cognitive Impairment, and a healthy state. For a more realistic evaluation, we use three independently collected datasets to simulate healthcare institutions employing these strategies. Our initial analysis shows that FL may not be the best solution in every scenario, as performance improvements are not guaranteed even with small amounts of available data, and further research is needed to determine the conditions under which it is beneficial.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1181]

A. Kathan, M. Bürger, A. Triantafyllopoulos, S. Milkus, R. Musil, B. W. Schuller and S. Amiriparian.
Real-world PTSD Recognition: A Cross-corpus and Cross-linguistic Evaluation.
INTERSPEECH 2024 - 25th Annual Conference of the International Speech Communication Association. Kos Island, Greece, Sep 01-05, 2024. DOI

Abstract

Post-traumatic Stress Disorder (PTSD) is a mental condition that develops as a result of catastrophic events. Triggers for this may include experiences, such as military combat, natural disasters, or sexual abuse, having a great influence on the mental wellbeing. Due to the severity of this condition, early detection and professional treatment is crucial. For this reason, previous research explored prediction models for recognising PTSD at an early stage. However, when these models are transferred from research to real-world applications, they face heterogeneous environments (e. g., different recording settings, various dialects or languages). To analyse this effect, we develop a speech-based PTSD recognition model and subsequently analyse its cross-corpus and cross-linguistic performance. Our experiments indicate that there are cross-cultural factors influencing PTSD and leading to a best area under the ROC curve (AUC) of 70.1% evaluated cross-corpus.

MCML Authors

Alexander Kathan

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

Shahin Amiriparian

Dr.

Health Informatics

[1180]

O. Schrüfer, M. Milling, F. Burkhardt, F. Eyben and B. W. Schuller.
Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition.
INTERSPEECH 2024 - 25th Annual Conference of the International Speech Communication Association. Kos Island, Greece, Sep 01-05, 2024. PDF

Abstract

Uncertainty Quantification (UQ) is an important building block for the reliable use of neural networks in real-world scenarios, as it can be a useful tool in identifying faulty predictions. Speech emotion recognition (SER) models can suffer from particularly many sources of uncertainty, such as the ambiguity of emotions, Out-of-Distribution (OOD) data or, in general, poor recording conditions. Reliable UQ methods are thus of particular interest as in many SER applications no prediction is better than a faulty prediction. While the effects of label ambiguity on uncertainty are well documented in the literature, we focus our work on an evaluation of UQ methods for SER under common challenges in real-world application, such as corrupted signals, and the absence of speech. We show that simple UQ methods can already give an indication of the uncertainty of a prediction and that training with additional OOD data can greatly improve the identification of such signals.

MCML Authors

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1179]

A. Spiesberger, A. Triantafyllopoulos, A. Kathan, A. Semertzidou, C. Gawrilow, T. Reinelt, W. Rauch and B. W. Schuller.
'So... my child...' -- How Child ADHD Influences the Way Parents Talk.
INTERSPEECH 2024 - 25th Annual Conference of the International Speech Communication Association. Kos Island, Greece, Sep 01-05, 2024. PDF

Abstract

Attention-deficit/hyperactivity disorder (ADHD) exerts a psychological burden not only on affected individuals but also on their social support systems. Of particular interest are the parents, who often face challenges related to their child’s condition, including its impact on their own mental well-being. The interaction among the child’s symptomatology, parental mental health, and the parent-child relationship is a crucial area of investigation. Expressed Emotion (EE), as assessed through the Preschool Five Minute Speech Sample (PFMSS), serves as a valuable measure. However, manual annotation of EE can be cumbersome and impractical for continuous monitoring. To address this, we propose leveraging machine learning methods. This study presents an initial exploration into predicting children’s ADHD diagnosis using linguistic and paralinguistic features derived from the PFMSS. Despite achieving a UAR score of 67.1%, our results have not surpassed the performance of manually annotated EE.

MCML Authors

Anika Spiesberger

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Alexander Kathan

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1178]

A. Triantafyllopoulos, A. Batliner, S. Rampp, M. Milling and B. W. Schuller.
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition.
INTERSPEECH 2024 - 25th Annual Conference of the International Speech Communication Association. Kos Island, Greece, Sep 01-05, 2024. DOI

Abstract

We revisit the INTERSPEECH 2009 Emotion Challenge – the first ever speech emotion recognition (SER) challenge – and evaluate a series of deep learning models that are representative of the major advances in SER research in the time since then. We start by training each model using a fixed set of hyperparameters, and further fine-tune the best-performing models of that initial setup with a grid search. Results are always reported on the official test set with a separate validation set only used for early stopping. Most models score below or close to the official baseline, while they marginally outperform the original challenge winners after hyperparameter tuning. Our work illustrates that, despite recent progress, FAU-AIBO remains a very challenging benchmark. An interesting corollary is that newer methods do not consistently outperform older ones, showing that progress towards ‘solving’ SER is not necessarily monotonic.

MCML Authors

Andreas Triantafyllopoulos

Health Informatics

Anton Batliner

Dr.

Health Informatics

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1177]

A. Triantafyllopoulos and B. W. Schuller.
Enrolment-based personalisation for improving individual-level fairness in speech emotion recognition.
INTERSPEECH 2024 - 25th Annual Conference of the International Speech Communication Association. Kos Island, Greece, Sep 01-05, 2024. PDF

Abstract

The expression of emotion is highly individualistic. However, contemporary speech emotion recognition (SER) systems typically rely on population-level models that adopt a ‘one-size-fits-all’ approach for predicting emotion. Moreover, standard evaluation practices measure performance also on the population level, thus failing to characterise how models work across different speakers. In the present contribution, we present a new method for capitalising on individual differences to adapt an SER model to each new speaker using a minimal set of enrolment utterances. In addition, we present novel evaluation schemes for measuring fairness across different speakers. Our findings show that aggregated evaluation metrics may obfuscate fairness issues on the individual-level, which are uncovered by our evaluation, and that our proposed method can improve performance both in aggregated and disaggregated terms.

MCML Authors

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1176]

A. Triantafyllopoulos, L. Christ, A. Gebhard, X. Jing, A. Kathan, M. Milling, I. Tsangko, S. Amiriparian and B. W. Schuller.
Beyond deep learning: Charting the next frontiers of affective computing.
Intelligent Computing 3.0089 (Sep. 2024). DOI

Abstract

Affective computing (AC), like most other areas of computational research, has benefited tremendously from advances in deep learning (DL). These advances have opened up new horizons in AC research and practice. Yet, as DL dominates the community’s attention, there is a danger of overlooking other emerging trends in artificial intelligence (AI) research. Furthermore, over-reliance on one particular technology may lead to stagnating progress. In an attempt to foster the exploration of complementary directions, we provide a concise, easily digestible overview of emerging trends in AI research that stand to play a vital role in solving some of the remaining challenges in AC research. Our overview is driven by the limitations of the current state of the art as it pertains to AC.

MCML Authors

Andreas Triantafyllopoulos

Health Informatics

Alexander Gebhard

Health Informatics

Xin Jing

Health Informatics

Alexander Kathan

Health Informatics

Manuel Milling

Health Informatics

Iosif Tsangko

Health Informatics

Shahin Amiriparian

Dr.

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1175]

M. Milling, S. Liu, A. Triantafyllopoulos, I. Aslan and B. W. Schuller.
Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance.
Journal of Computer Science and Technology 39 (Sep. 2024). DOI

Abstract

Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. To guide the optimisation of the AE module towards a target application, and especially to overcome difficult samples, we make use of the sample-wise performance measure as an indication of sample importance. In experiments, we consider four representative applications to evaluate our training paradigm, i.e., ASR, speech command recognition (SCR), speech emotion recognition (SER), and ASC. These applications are associated with speech and nonspeech tasks concerning semantic and non-semantic features, transient and global information, and the experimental results indicate that our proposed approach can considerably boost the noise robustness of the models, especially at low signal-to-noise ratios, for a wide range of computer audition tasks in everyday-life noisy environments.

MCML Authors

Manuel Milling

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1174]

E. Eulig, B. Ommer and M. Kachelrieß.
Benchmarking deep learning-based low-dose CT image denoising algorithms.
Medical Physics 51 (Sep. 2024). DOI

Abstract

Background: Long-lasting efforts have been made to reduce radiation dose and thus the potential radiation risk to the patient for computed tomography (CT) acquisitions without severe deterioration of image quality. To this end, various techniques have been employed over the years including iterative reconstruction methods and noise reduction algorithms.
Purpose: Recently, deep learning-based methods for noise reduction became increasingly popular and a multitude of papers claim ever improving performance both quantitatively and qualitatively. However, the lack of a standardized benchmark setup and inconsistencies in experimental design across studies hinder the verifiability and reproducibility of reported results.
Methods: In this study, we propose a benchmark setup to overcome those flaws and improve reproducibility and verifiability of experimental results in the field. We perform a comprehensive and fair evaluation of several state-of-the-art methods using this standardized setup.
Results: Our evaluation reveals that most deep learning-based methods show statistically similar performance, and improvements over the past years have been marginal at best.
Conclusions: This study highlights the need for a more rigorous and fair evaluation of novel deep learning-based methods for low-dose CT image denoising. Our benchmark setup is a first and important step towards this direction and can be used by future researchers to evaluate their algorithms.

MCML Authors

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1173]

B. Clarke, E. Holtkamp, H. Öztürk, M. Mück, M. Wahlberg, K. Meyer, F. Munzlinger, F. Brechtmann, F. R. Hölzlwimmer, J. Lindner, Z. Chen, J. Gagneur and O. Stegle.
Integration of variant annotations using deep set networks boosts rare variant association testing.
Nature Genetics 56 (Sep. 2024). DOI

Abstract

Rare genetic variants can have strong effects on phenotypes, yet accounting for rare variants in genetic analyses is statistically challenging due to the limited number of allele carriers and the burden of multiple testing. While rich variant annotations promise to enable well-powered rare variant association tests, methods integrating variant annotations in a data-driven manner are lacking. Here we propose deep rare variant association testing (DeepRVAT), a model based on set neural networks that learns a trait-agnostic gene impairment score from rare variant annotations and phenotypes, enabling both gene discovery and trait prediction. On 34 quantitative and 63 binary traits, using whole-exome-sequencing data from UK Biobank, we find that DeepRVAT yields substantial gains in gene discoveries and improved detection of individuals at high genetic risk. Finally, we demonstrate how DeepRVAT enables calibrated and computationally efficient rare variant tests at biobank scale, aiding the discovery of genetic risk factors for human disease traits.

MCML Authors

Julien Gagneur

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Molecular Medicine

[1172]

H. J. Coyle-Asbil, L. Burk, M. Brandes, B. Brandes, C. Buck, M. N. Wright and L. A. Vallis.
Energy Expenditure Prediction in Preschool Children: A Machine Learning Approach Using Accelerometry and External Validation.
Physiological Measurement 45.9 (Sep. 2024). DOI

Abstract

Objective. This study aimed to develop convolutional neural networks (CNNs) models to predict the energy expenditure (EE) of children from raw accelerometer data. Additionally, this study sought to external validation of the CNN models in addition to the linear regression (LM), random forest (RF), and full connected neural network (FcNN) models published in Steenbock et al (2019 J. Meas. Phys. Behav. 2 94–102). Approach. Included in this study were 41 German children (3.0–6.99 years) for the training and internal validation who were equipped with GENEActiv, GT3X+, and activPAL accelerometers. The external validation dataset consisted of 39 Canadian children (3.0–5.99 years) that were equipped with OPAL, GT9X, GENEActiv, and GT3X+ accelerometers. EE was recorded simultaneously in both datasets using a portable metabolic unit. The protocols consisted of a semi-structured activities ranging from low to high intensities. The root mean square error (RMSE) values were calculated and used to evaluate model performances. Main results. (1) The CNNs outperformed the LM (13.17%–23.81% lower mean RMSE values), FcNN (8.13%–27.27% lower RMSE values) and the RF models (3.59%–18.84% lower RMSE values) in the internal dataset. (2) In contrast, it was found that when applied to the external Canadian dataset, the CNN models had consistently higher RMSE values compared to the LM, FcNN, and RF. Significance. Although CNNs can enhance EE prediction accuracy, their ability to generalize to new datasets and accelerometer brands/models, is more limited compared to LM, RF, and FcNN models.

MCML Authors

Lukas Burk

Statistical Learning and Data Science

[1171]

O. Hahn, N. Araslanov, S. Schaub-Meyer and S. Roth.
Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals.
Transactions on Machine Learning Research (Sep. 2024). URL GitHub

Abstract

Unsupervised semantic segmentation aims to automatically partition images into semantically meaningful regions by identifying global semantic categories within an image corpus without any form of annotation. Building upon recent advances in self-supervised representation learning, we focus on how to leverage these large pre-trained models for the downstream task of unsupervised segmentation. We present PriMaPs - Principal Mask Proposals - decomposing images into semantically meaningful masks based on their feature representation. This allows us to realize unsupervised semantic segmentation by fitting class prototypes to PriMaPs with a stochastic expectation-maximization algorithm, PriMaPs-EM. Despite its conceptual simplicity, PriMaPs-EM leads to competitive results across various pre-trained backbone models, including DINO and DINOv2, and across different datasets, such as Cityscapes, COCO-Stuff, and Potsdam-3. Importantly, PriMaPs-EM is able to boost results when applied orthogonally to current state-of-the-art unsupervised semantic segmentation pipelines.

MCML Authors

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

[1170]

P. Kolpaczki, E. Hüllermeier and V. Bengs.
Piecewise-Stationary Dueling Bandits.
Transactions on Machine Learning Research (Sep. 2024). URL

Abstract

We study the piecewise-stationary dueling bandits problem with arms, where the time horizon consists of stationary segments, each of which is associated with its own preference matrix. The learner repeatedly selects a pair of arms and observes a binary preference between them as feedback. To minimize the accumulated regret, the learner needs to pick the Condorcet winner of each stationary segment as often as possible, despite preference matrices and segment lengths being unknown. We propose the Beat the Winner Reset algorithm and prove a bound on its expected binary weak regret in the stationary case, which tightens the bound of current state-of-art algorithms. We also show a regret bound for the non-stationary case, without requiring knowledge of or . We further propose and analyze two meta-algorithms, DETECT for weak regret and Monitored Dueling Bandits for strong regret, both based on a detection-window approach that can incorporate any dueling bandit algorithm as a black-box algorithm. Finally, we prove a worst-case lower bound for expected weak regret in the non-stationary case.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

* Former Member

[1169]

D. Tschernutter, M. Kraus and S. Feuerriegel.
A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions.
Transactions on Machine Learning Research (Sep. 2024). URL

Abstract

We propose an algorithm for optimizing the parameters of single hidden layer neural networks. Specifically, we derive a blockwise difference-of-convex (DC) functions representation of the objective function. Based on the latter, we propose a block coordinate descent (BCD) approach that we combine with a tailored difference-of-convex functions algorithm (DCA). We prove global convergence of the proposed algorithm. Furthermore, we mathematically analyze the convergence rate of parameters and the convergence rate in value (i.e., the training loss). We give conditions under which our algorithm converges linearly or even faster depending on the local shape of the loss function. We confirm our theoretical derivations numerically and compare our algorithm against state-of-the-art gradient-based solvers in terms of both training loss and test loss.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Artificial Intelligence in Management

[1168]

C. Janke, R. Rubio-Acero, M. Weigert, C. Reinkemeyer, Y. Khazaei, L. Kleinlein, R. Le Gleut, K. Radon, M. Hannes, F. Picasso, A. E. Lucke, M. Plank, I. C. Kotta, I. Paunovic, A. Zhelyazkova, I. Noreña, S. Winter, M. Hoelscher, A. Wieser, H. Küchenhoff, N. Castelletti and o. b. o. t. ORCHESTRA Working Group.
Understanding the Omicron Variant Impact in Healthcare Workers: Insights from the Prospective COVID-19 Post-Immunization Serological Cohort in Munich (KoCo-Impf) on Risk Factors for Breakthrough and Reinfections.
Viruses 16.10 (Sep. 2024). DOI

Abstract

This study analyzes immune responses to SARS-CoV-2 vaccination and infection, including asymptomatic cases, focusing on infection risks during the Omicron wave, particularly among high-risk healthcare workers. In the KoCo-Impf study, we monitored 6088 vaccinated participants in Munich aged 18 and above. From 13 May to 31 July 2022, 2351 participants were follow-uped. Logistic regression models evaluated primary, secondary, and breakthrough infections (BTIs). Roche Elecsys® Anti-SARS-CoV-2 assays detected prior infections (via anti-Nucleocapsid antibodies) and assessed vaccination/infection impact (via anti-Spike antibodies) using dried blood spots. Our findings revealed an anti-Nucleocapsid seroprevalence of 44.1%. BTIs occurred in 38.8% of participants, with reinfections in 48.0%. Follow-up participation was inversely associated with current smoking and non-vaccination, while significantly increasing with age and receipt of three vaccine doses. Larger household sizes and younger age increased infection risks, whereas multiple vaccinations and older age reduced them. Household size and specific institutional subgroups were risk factors for BTIs. The anti-Nucleocapsid value prior to the second infection was significantly associated with reinfection risk. Institutional subgroups influenced all models, underscoring the importance of tailored outbreak responses. The KoCo-Impf study underscores the importance of vaccination, demographic factors, and institutional settings in understanding SARS-CoV-2 infection risks during the Omicron wave.

MCML Authors

Maximilian Weigert

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[1167]

A. Bashardoust, Y. Feng, D. Geißler, S. Feuerriegel and Y. R. Shrestha.
The Effect of Education in Prompt Engineering: Evidence from Journalists.
Preprint (Sep. 2024). arXiv

Abstract

Large language models (LLMs) are increasingly used in daily work. In this paper, we analyze whether training in prompt engineering can improve the interactions of users with LLMs. For this, we conducted a field experiment where we asked journalists to write short texts before and after training in prompt engineering. We then analyzed the effect of training on three dimensions: (1) the user experience of journalists when interacting with LLMs, (2) the accuracy of the texts (assessed by a domain expert), and (3) the reader perception, such as clarity, engagement, and other text quality dimensions (assessed by non-expert readers). Our results show: (1) Our training improved the perceived expertise of journalists but also decreased the perceived helpfulness of LLM use. (2) The effect on accuracy varied by the difficulty of the task. (3) There is a mixed impact of training on reader perception across different text quality dimensions.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1166]

M. C. da Silva, G. M. Tavares, E. Medvet and S. Barbon Junior.
Problem-oriented AutoML in Clustering.
Preprint (Sep. 2024). arXiv

Abstract

The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks by addressing the shortcomings of traditional AutoML solutions. Conventional methods often rely on predefined internal Clustering Validity Indexes (CVIs) and static meta-features, limiting their adaptability and effectiveness across diverse clustering tasks. In contrast, PoAC establishes a dynamic connection between the clustering problem, CVIs, and meta-features, allowing users to customize these components based on the specific context and goals of their task. At its core, PoAC employs a surrogate model trained on a large meta-knowledge base of previous clustering datasets and solutions, enabling it to infer the quality of new clustering pipelines and synthesize optimal solutions for unseen datasets. Unlike many AutoML frameworks that are constrained by fixed evaluation metrics and algorithm sets, PoAC is algorithm-agnostic, adapting seamlessly to different clustering problems without requiring additional data or retraining. Experimental results demonstrate that PoAC not only outperforms state-of-the-art frameworks on a variety of datasets but also excels in specific tasks such as data visualization, and highlight its ability to dynamically adjust pipeline configurations based on dataset complexity.

MCML Authors

Gabriel Marques Tavares

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Database Systems and Data Mining

[1165]

R. Hornung and A. Hapfelmeier.
Multi forests: Variable importance for multi-class outcomes.
Preprint (Sep. 2024). arXiv

Abstract

In prediction tasks with multi-class outcomes, identifying covariates specifically associated with one or more outcome classes can be important. Conventional variable importance measures (VIMs) from random forests (RFs), like permutation and Gini importance, focus on overall predictive performance or node purity, without differentiating between the classes. Therefore, they can be expected to fail to distinguish class-associated covariates from covariates that only distinguish between groups of classes. We introduce a VIM called multi-class VIM, tailored for identifying exclusively class-associated covariates, via a novel RF variant called multi forests (MuFs). The trees in MuFs use both multi-way and binary splitting. The multi-way splits generate child nodes for each class, using a split criterion that evaluates how well these nodes represent their respective classes. This setup forms the basis of the multi-class VIM, which measures the discriminatory ability of the splits performed in the respective covariates with regard to this split criterion. Alongside the multi-class VIM, we introduce a second VIM, the discriminatory VIM. This measure, based on the binary splits, assesses the strength of the general influence of the covariates, irrespective of their class-associatedness. Simulation studies demonstrate that the multi-class VIM specifically ranks class-associated covariates highly, unlike conventional VIMs which also rank other types of covariates highly. Analyses of 121 datasets reveal that MuFs often have slightly lower predictive performance compared to conventional RFs. This is, however, not a limiting factor given the algorithm’s primary purpose of calculating the multi-class VIM.

MCML Authors

Roman Hornung

Dr.

Biometry in Molecular Medicine

[1164]

S. Ji, Z. Li, I. Paul, J. Paavola, P. Lin, P. Chen, D. O'Brien, H. Luo, H. Schütze, J. Tiedemann and B. Haddow.
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models.
Preprint (Sep. 2024). arXiv

Abstract

In this work, we introduce EMMA-500, a large-scale multilingual language model continue-trained on texts across 546 languages designed for enhanced multilingual performance, focusing on improving language coverage for low-resource languages. To facilitate continual pre-training, we compile the MaLA corpus, a comprehensive multilingual dataset enriched with curated datasets across diverse domains. Leveraging this corpus, we conduct extensive continual pre-training of the Llama 2 7B model, resulting in EMMA-500, which demonstrates robust performance across a wide collection of benchmarks, including a comprehensive set of multilingual tasks and PolyWrite, an open-ended generation benchmark developed in this study. Our results highlight the effectiveness of continual pre-training in expanding large language models’ language capacity, particularly for underrepresented languages, demonstrating significant gains in cross-lingual transfer, task generalization, and language adaptability.

MCML Authors

Peiqin Lin

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1163]

T. Liu, Z. Lai, G. Zhang, P. Torr, V. Demberg, V. Tresp and J. Gu.
Multimodal Pragmatic Jailbreak on Text-to-image Models.
Preprint (Sep. 2024). arXiv

Abstract

Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-to-image (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two close-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from 8% to 74%. In real-world scenarios, various filters such as keyword blocklists, customized prompt filters, and NSFW image filters, are commonly employed to mitigate these risks. We evaluate the effectiveness of such filters against our jailbreak and found that, while current classifiers may be effective for single modality detection, they fail to work against our jailbreak. Our work provides a foundation for further development towards more secure and reliable T2I models.

MCML Authors

Tong Liu

Database Systems and Data Mining

Gengyuan Zhang

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1162]

Y. Ma, A. Li, Q. Khan and D. Cremers.
Enhancing the Performance of Multi-Vehicle Navigation in Unstructured Environments using Hard Sample Mining.
Preprint (Sep. 2024). arXiv GitHub

Abstract

Contemporary research in autonomous driving has demonstrated tremendous potential in emulating the traits of human driving. However, they primarily cater to areas with well built road infrastructure and appropriate traffic management systems. Therefore, in the absence of traffic signals or in unstructured environments, these self-driving algorithms are expected to fail. This paper proposes a strategy for autonomously navigating multiple vehicles in close proximity to their desired destinations without traffic rules in unstructured environments. Graphical Neural Networks (GNNs) have demonstrated good utility for this task of multi-vehicle control. Among the different alternatives of training GNNs, supervised methods have proven to be most data-efficient, albeit require ground truth labels. However, these labels may not always be available, particularly in unstructured environments without traffic regulations. Therefore, a tedious optimization process may be required to determine them while ensuring that the vehicles reach their desired destination and do not collide with each other or any obstacles. Therefore, in order to expedite the training process, it is essential to reduce the optimization time and select only those samples for labeling that add most value to the training. In this paper, we propose a warm start method that first uses a pre-trained model trained on a simpler subset of data. Inference is then done on more complicated scenarios, to determine the hard samples wherein the model faces the greatest predicament. This is measured by the difficulty vehicles encounter in reaching their desired destination without collision. Experimental results demonstrate that mining for hard samples in this manner reduces the requirement for supervised training data by 10 fold.

MCML Authors

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

B3 | Multimodal Perception
→ Group Matthias Althoff

Computer Vision & Artificial Intelligence

[1161]

D. Ostermeier, J. Külz and M. Althoff.
Automatic Geometric Decomposition for Analytical Inverse Kinematics.
Preprint (Sep. 2024). arXiv

Abstract

Calculating the inverse kinematics (IK) is fundamental for motion planning in robotics. Compared to numerical or learning-based approaches, analytical IK provides higher efficiency and accuracy. However, existing analytical approaches require manual intervention, are ill-conditioned, or rely on time-consuming symbolic manipulation. In this paper, we propose a fast and stable method that enables automatic online derivation and computation of analytical inverse kinematics. Our approach is based on remodeling the kinematic chain of a manipulator to automatically decompose its IK into pre-solved geometric subproblems. We exploit intersecting and parallel joint axes to assign a given manipulator to a certain kinematic class and the corresponding subproblem decomposition. In numerical experiments, we demonstrate that our decomposition is orders of magnitudes faster in deriving the IK than existing tools that employ symbolic manipulation. Following this one-time derivation, our method matches and even surpasses baselines, such as IKFast, in terms of speed and accuracy during the online computation of explicit IK solutions. Finally, we provide a C++ toolbox with Python wrappers that, for the first time, enables plug-and-play analytical IK within less than a millisecond.

MCML Authors

Jonathan Külz

Cyber Physical Systems

Matthias Althoff

Prof. Dr.

Cyber Physical Systems

[1160]

S. Papatheodorou, S. Boche, S. Laina and S. Leutenegger.
Efficient Submap-based Autonomous MAV Exploration using Visual-Inertial SLAM Configurable for LiDARs or Depth Cameras.
Preprint (Sep. 2024). arXiv

Abstract

Autonomous exploration of unknown space is an essential component for the deployment of mobile robots in the real world. Safe navigation is crucial for all robotics applications and requires accurate and consistent maps of the robot’s surroundings. To achieve full autonomy and allow deployment in a wide variety of environments, the robot must rely on on-board state estimation which is prone to drift over time. We propose a Micro Aerial Vehicle (MAV) exploration framework based on local submaps to allow retaining global consistency by applying loop-closure corrections to the relative submap poses. To enable large-scale exploration we efficiently compute global, environment-wide frontiers from the local submap frontiers and use a sampling-based next-best-view exploration planner. Our method seamlessly supports using either a LiDAR sensor or a depth camera, making it suitable for different kinds of MAV platforms. We perform comparative evaluations in simulation against a state-of-the-art submap-based exploration framework to showcase the efficiency and reconstruction quality of our approach. Finally, we demonstrate the applicability of our method to real-world MAVs, one equipped with a LiDAR and the other with a depth camera.

MCML Authors

Stefan Leutenegger

Prof. Dr.

Machine Learning for Robotics

[1159]

B. W. Schuller, A. Mallol-Ragolta, A. P. Almansa, I. Tsangko, M. M. Amin, A. Semertzidou, L. Christ and S. Amiriparian.
Affective Computing Has Changed: The Foundation Model Disruption.
Preprint (Sep. 2024). arXiv

Abstract

The dawn of Foundation Models has on the one hand revolutionised a wide range of research problems, and, on the other hand, democratised the access and use of AI-based tools by the general public. We even observe an incursion of these models into disciplines related to human psychology, such as the Affective Computing domain, suggesting their affective, emerging capabilities. In this work, we aim to raise awareness of the power of Foundation Models in the field of Affective Computing by synthetically generating and analysing multimodal affective data, focusing on vision, linguistics, and speech (acoustics). We also discuss some fundamental problems, such as ethical issues and regulatory aspects, related to the use of Foundation Models in this research area.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

Adria Mallol-Ragolta

Health Informatics

Iosif Tsangko

Health Informatics

Shahin Amiriparian

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Health Informatics

[1158]

H. Schulz-Kümpel, S. Fischer, T. Nagler, A.-L. Boulesteix, B. Bischl and R. Hornung.
Constructing Confidence Intervals for 'the' Generalization Error – a Comprehensive Benchmark Study.
Preprint (Sep. 2024). arXiv

Abstract

When assessing the quality of prediction models in machine learning, confidence intervals (CIs) for the generalization error, which measures predictive performance, are a crucial tool. Luckily, there exist many methods for computing such CIs and new promising approaches are continuously being proposed. Typically, these methods combine various resampling procedures, most popular among them cross-validation and bootstrapping, with different variance estimation techniques. Unfortunately, however, there is currently no consensus on when any of these combinations may be most reliably employed and how they generally compare. In this work, we conduct the first large-scale study comparing CIs for the generalization error - empirically evaluating 13 different methods on a total of 18 tabular regression and classification problems, using four different inducers and a total of eight loss functions. We give an overview of the methodological foundations and inherent challenges of constructing CIs for the generalization error and provide a concise review of all 13 methods in a unified framework. Finally, the CI methods are evaluated in terms of their relative coverage frequency, width, and runtime. Based on these findings, we are able to identify a subset of methods that we would recommend. We also publish the datasets as a benchmarking suite on OpenML and our code on GitHub to serve as a basis for further studies.

MCML Authors

Hannah Schulz-Kümpel

Biometry in Molecular Medicine

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Learning and Data Science

Roman Hornung

Dr.

Biometry in Molecular Medicine

[1157]

A. Stephan, D. Zhu, M. Aßenmacher, X. Shen and B. Roth.
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks.
Preprint (Sep. 2024). arXiv

Abstract

To reduce the need for human annotations, large language models (LLMs) have been proposed as judges of the quality of other candidate models. LLM judges are typically evaluated by measuring the correlation with human judgments on generation tasks such as summarization or machine translation. In contrast, we study LLM judges on mathematical reasoning tasks. These tasks require multi-step reasoning, and the correctness of their solutions is verifiable, enabling a more objective evaluation. We perform a detailed performance analysis and find that the used judges are mostly unable to improve task performance but are able to pick the better model. Our analysis uncovers a strong correlation between judgment performance and the candidate model task performance. We observe that judges tend to choose the model of higher quality even if its answer is incorrect. Further, we show that it is possible to use statistics, such as the task performances of the individual models, to predict judgment performance. In an ablation, we either swap or mask the candidate answers and observe that judges often keep the original judgment, providing evidence that judges incorporate writing style in their judgments. In summary, we find that regularities in the judgments are quantifiable using statistical measures and provide various angles on exploiting them.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1156]

L. von der Heyde, A.-C. Haensch, A. Wenz and B. Ma.
United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections.
Preprint (Sep. 2024). arXiv

Abstract

Large language models (LLMs) are perceived by some as having the potential to revolutionize social science research, considering their training data includes information on human attitudes and behavior. If these attitudes are reflected in LLM output, LLM-generated ‘synthetic samples’ could be used as a viable and efficient alternative to surveys of real humans. However, LLM-synthetic samples might exhibit coverage bias due to training data and fine-tuning processes being unrepresentative of diverse linguistic, social, political, and digital contexts. In this study, we examine to what extent LLM-based predictions of public opinion exhibit context-dependent biases by predicting voting behavior in the 2024 European Parliament elections using a state-of-the-art LLM. We prompt GPT-4-Turbo with anonymized individual-level background information, varying prompt content and language, ask the LLM to predict each person’s voting behavior, and compare the weighted aggregates to the real election results. Our findings emphasize the limited applicability of LLM-synthetic samples to public opinion prediction. We show that (1) the LLM-based prediction of future voting behavior largely fails, (2) prediction accuracy is unequally distributed across national and linguistic contexts, and (3) improving LLM predictions requires detailed attitudinal information about individuals for prompting. In investigating the contextual differences of LLM-based predictions of public opinion, our research contributes to the understanding and mitigation of biases and inequalities in the development of LLMs and their applications in computational social science.

MCML Authors

Leah von der Heyde

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[1155]

Ç. Yapar, R. Levie, G. Kutyniok and G. Caire.
Dataset of Pathloss and ToA Radio Maps With Localization Application.
Preprint (Sep. 2024). arXiv

Abstract

In this article, we present a collection of radio map datasets in dense urban setting, which we generated and made publicly available. The datasets include simulated pathloss/received signal strength (RSS) and time of arrival (ToA) radio maps over a large collection of realistic dense urban setting in real city maps. The two main applications of the presented dataset are 1) learning methods that predict the pathloss from input city maps (namely, deep learning-based simulations), and, 2) wireless localization. The fact that the RSS and ToA maps are computed by the same simulations over the same city maps allows for a fair comparison of the RSS and ToA-based localization methods.

MCML Authors

Gitta Kutyniok

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Mathematical Foundations of Artificial Intelligence

[1154]

I. Ziegler, A. Köksal, D. Elliott and H. Schütze.
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation.
Preprint (Sep. 2024). arXiv

Abstract

Building high-quality datasets for specialized tasks is a time-consuming and resource-intensive process that often requires specialized domain knowledge. We propose Corpus Retrieval and Augmentation for Fine-Tuning (CRAFT), a method for generating synthetic datasets, given a small number of user-written few-shots that demonstrate the task to be performed. Given the few-shot examples, we use large-scale public web-crawled corpora and similarity-based document retrieval to find other relevant human-written documents. Lastly, instruction-tuned large language models (LLMs) augment the retrieved documents into custom-formatted task samples, which then can be used for fine-tuning. We demonstrate that CRAFT can efficiently generate large-scale task-specific training datasets for four diverse tasks: biology question-answering (QA), medicine QA and commonsense QA as well as summarization. Our experiments show that CRAFT-based models outperform or achieve comparable performance to general LLMs for QA tasks, while CRAFT-based summarization models outperform models trained on human-curated data by 46 preference points.

MCML Authors

Abdullatif Köksal

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1153]

A. Triantafyllopoulos, A. Gebhard, M. Milling, S. Rampp and B. W. Schuller.
An Automatic Analysis of Ultrasound Vocalisations for the Prediction of Interaction Context in Captive Egyptian Fruit Bats.
EUSIPCO 2024 - 32nd European Signal Processing Conference. Lyon, France,, Aug 26-30, 2024. DOI

Abstract

Prior work in computational bioacoustics has mostly focused on the detection of animal presence in a particular habitat. However, animal sounds contain much richer information than mere presence; among others, they encapsulate the interactions of those animals with other members of their species. Studying these interactions is almost impossible in a naturalistic setting, as the ground truth is often lacking. The use of animals in captivity instead offers a viable alternative pathway. However, most prior works follow a traditional, statistics-based approach to analysing interactions. In the present work, we go beyond this standard framework by attempting to predict the underlying context in interactions between captive Rousettus Aegyptiacus using deep neural networks. We reach an unweighted average recall of over 30% - more than thrice the chance level - and show error patterns that differ from our statistical analysis. This work thus represents an important step towards the automatic analysis of states in animals from sound.

MCML Authors

Andreas Triantafyllopoulos

Health Informatics

Alexander Gebhard

Health Informatics

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1152]

T. Decker, A. Koebler, M. Lebacher, I. Thon, V. Tresp and F. Buettner.
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance.
KDD 2024 - 30th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Barcelona, Spain, Aug 25-29, 2024. DOI

Abstract

Monitoring and maintaining machine learning models are among the most critical challenges in translating recent advances in the field into real-world applications. However, current monitoring methods lack the capability of provide actionable insights answering the question of why the performance of a particular model really degraded. In this work, we propose a novel approach to explain the behavior of a black-box model under feature shifts by attributing an estimated performance change to interpretable input characteristics. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation (XPE). We analyze the underlying assumptions and demonstrate the superiority of our approach over several baselines on different data sets across various data modalities such as images, audio, and tabular data. We also indicate how the generated results can lead to valuable insights, enabling explanatory model monitoring by revealing potential root causes for model deterioration and guiding toward actionable countermeasures.

MCML Authors

Thomas Decker

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Database Systems and Data Mining

[1151]

M. Kuzmanovic, D. Frauen, T. Hatt and S. Feuerriegel.
Causal Machine Learning for Cost-Effective Allocation of Development Aid.
KDD 2024 - 30th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Barcelona, Spain, Aug 25-29, 2024. DOI

Abstract

The Sustainable Development Goals (SDGs) of the United Nations provide a blueprint of a better future by ’leaving no one behind’, and, to achieve the SDGs by 2030, poor countries require immense volumes of development aid. In this paper, we develop a causal machine learning framework for predicting heterogeneous treatment effects of aid disbursements to inform effective aid allocation. Specifically, our framework comprises three components: (i) a balancing autoencoder that uses representation learning to embed high-dimensional country characteristics while addressing treatment selection bias; (ii) a counterfactual generator to compute counterfactual outcomes for varying aid volumes to address small sample-size settings; and (iii) an inference model that is used to predict heterogeneous treatment-response curves. We demonstrate the effectiveness of our framework using data with official development aid earmarked to end HIV/AIDS in 105 countries, amounting to more than USD 5.2 billion. For this, we first show that our framework successfully computes heterogeneous treatment-response curves using semi-synthetic data. Then, we demonstrate our framework using real-world HIV data. Our framework points to large opportunities for a more effective aid allocation, suggesting that the total number of new HIV infections could be reduced by up to 3.3% (~50,000 cases) compared to the current allocation practice.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence in Management

[1150]

S. Urchs, V. Thurner, M. Aßenmacher, C. Heumann and S. Thiemichen.
Detecting Gender Discrimination on Actor Level Using Linguistic Discourse Analysis.
GeBNLP 2024 - 5th Workshop on Gender Bias in Natural Language Processing. Bangkok, Thailand, Aug 16, 2024. URL

Abstract

With the usage of tremendous amounts of text data for training powerful large language models such as ChatGPT, the issue of analysing and securing data quality has become more pressing than ever. Any biases, stereotypes and discriminatory patterns that exist in the training data can be reproduced, reinforced or broadly disseminated by the models in production. Therefore, it is crucial to carefully select and monitor the text data that is used as input to train the model. Due to the vast amount of training data, this process needs to be (at least partially) automated. In this work, we introduce a novel approach for automatically detecting gender discrimination in text data on the actor level based on linguistic discourse analysis. Specifically, we combine existing information extraction (IE) techniques to partly automate the qualitative research done in linguistic discourse analysis. We focus on two important steps: Identifying the respectiveperson-named-entity (an actor) and all forms it is referred to (Nomination), and detecting the characteristics it is ascribed (Predication). Asa proof of concept, we integrate these two steps into a pipeline for automated text analysis. The separate building blocks of the pipeline could be flexibly adapted, extended, and scaled for bigger datasets to accommodate a wide range of usage scenarios and specific ML tasks or help social scientists with analysis tasks. We showcase and evaluate our approach on several real and simulated exemplary texts.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[1149]

V. Blaschke, C. Purschke, H. Schütze and B. Plank.
What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects.
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Natural language processing (NLP) has largely focused on modelling standardized languages. More recently, attention has increasingly shifted to local, non-standardized languages and dialects. However, the relevant speaker populations’ needs and wishes with respect to NLP tools are largely unknown. In this paper, we focus on dialects and regional languages related to German – a group of varieties that is heterogeneous in terms of prestige and standardization. We survey speakers of these varieties (N=327) and present their opinions on hypothetical language technologies for their dialects. Although attitudes vary among subgroups of our respondents, we find that respondents are especially in favour of potential NLP tools that work with dialectal input (especially audio input) such as virtual assistants, and less so for applications that produce dialectal output such as machine translation or spellcheckers.

MCML Authors

Verena Blaschke

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[1148]

A. H. Kargaran, F. Yvon and H. Schütze.
MaskLID: Code-Switching Language Identification through Iterative Masking.
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI GitHub

Abstract

We present MaskLID, a simple, yet effective, code-switching (CS) language identification (LID) method. MaskLID does not require any training and is designed to complement current high-performance sentence-level LIDs. Sentence-level LIDs are classifiers trained on monolingual texts to provide single labels, typically using a softmax layer to turn scores into probabilities. However, in cases where a sentence is composed in both L1 and L2 languages, the LID classifier often only returns the dominant label L1. To address this limitation, MaskLID employs a strategy to mask text features associated with L1, allowing the LID to classify the text as L2 in the next round. This method uses the LID itself to identify the features that require masking and does not rely on any external resource. In this work, we explore the use of MaskLID for two open-source LIDs (GlotLID and OpenLID), that are both based on the FastText architecture.

MCML Authors

Amir Hossein Kargaran

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1147]

T. Liu, I. Škrjanec and V. Demberg.
Temperature-scaling surprisal estimates improve fit to human reading times – but does it do so for the 'right reasons'?
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

A wide body of evidence shows that human language processing difficulty is predicted by the information-theoretic measure surprisal, a word’s negative log probability in context. However, it is still unclear how to best estimate these probabilities needed for predicting human processing difficulty – while a long-standing belief held that models with lower perplexity would provide more accurate estimates of word predictability, and therefore lead to better reading time predictions, recent work has shown that for very large models, psycholinguistic predictive power decreases. One reason could be that language models might be more confident of their predictions than humans, because they have had exposure to several magnitudes more data. In this paper, we test what effect temperature-scaling of large language model (LLM) predictions has on surprisal estimates and their predictive power of reading times of English texts. Firstly, we show that calibration of large language models typically improves with model size, i.e. poorer calibration cannot account for poorer fit to reading times. Secondly, we find that temperature-scaling probabilities lead to a systematically better fit to reading times (up to 89% improvement in delta log likelihood), across several reading time corpora. Finally, we show that this improvement in fit is chiefly driven by words that are composed of multiple subword tokens.

MCML Authors

Tong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining

[1146]

Y. Liu, C. Ma, H. Ye and H. Schütze.
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models.
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

The world’s more than 7000 languages are written in at least 293 scripts. Due to various reasons, many closely related languages use different scripts, which poses a difficulty for multilingual pretrained language models (mPLMs) in learning crosslingual knowledge through lexical overlap. As a consequence, mPLMs are faced with a script barrier: representations from different scripts are located in different subspaces, which can result in crosslingual transfer involving languages of different scripts performing suboptimally. To address this problem, we propose TransliCo, a framework that optimizes the Transliteration Contrastive Modeling (TCM) objective to fine-tune an mPLM by contrasting sentences in its training data and their transliterations in a unified script (in our case Latin), which enhances uniformity in the representation space for different scripts. Using Glot500-m, an mPLM pretrained on over 500 languages, as our source model, we fine-tune it on a small portion (5%) of its training data, and refer to the resulting model as Furina. We show that Furina not only better aligns representations from distinct scripts but also outperforms the original Glot500-m on various zero-shot crosslingual transfer tasks. Additionally, we achieve consistent improvement in a case study on the Indic group where the languages exhibit areal features but use different scripts. We make our code and models publicly available.

MCML Authors

Yihong Liu

Computational Linguistics

Chunlan Ma

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Haotian Ye

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computational Linguistics

[1145]

P. Mondorf and B. Plank.
Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning.
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Deductive reasoning plays a pivotal role in the formulation of sound and cohesive arguments. It allows individuals to draw conclusions that logically follow, given the truth value of the information provided. Recent progress in the domain of large language models (LLMs) has showcased their capability in executing deductive reasoning tasks. Nonetheless, a significant portion of research primarily assesses the accuracy of LLMs in solving such tasks, often overlooking a deeper analysis of their reasoning behavior. In this study, we draw upon principles from cognitive psychology to examine inferential strategies employed by LLMs, through a detailed evaluation of their responses to propositional logic problems. Our findings indicate that LLMs display reasoning patterns akin to those observed in humans, including strategies like supposition following or chain construction. Moreover, our research demonstrates that the architecture and scale of the model significantly affect its preferred method of reasoning, with more advanced models tending to adopt strategies more frequently than less sophisticated ones. Importantly, we assert that a model’s accuracy, that is the correctness of its final conclusion, does not necessarily reflect the validity of its reasoning process. This distinction underscores the necessity for more nuanced evaluation procedures in the field.

MCML Authors

Philipp Mondorf

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[1144]

L. K. Senel, B. Fetahu, D. Yoshida, Z. Chen, G. Castellucci, N. Vedula, J. I. Choi and S. Malmasi.
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers.
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Recommender systems are widely used to suggest engaging content, and Large Language Models (LLMs) have given rise to generative recommenders. Such systems can directly generate items, including for open-set tasks like question suggestion. While the world knowledge of LLMs enable good recommendations, improving the generated content through user feedback is challenging as continuously fine-tuning LLMs is prohibitively expensive. We present a training-free approach for optimizing generative recommenders by connecting user feedback loops to LLM-based optimizers. We propose a generative explore-exploit method that can not only exploit generated items with known high engagement, but also actively explore and discover hidden population preferences to improve recommendation quality. We evaluate our approach on question generation in two domains (e-commerce and general knowledge), and model user feedback with Click Through Rate (CTR). Experiments show our LLM-based explore-exploit approach can iteratively improve recommendations, and consistently increase CTR. Ablation analysis shows that generative exploration is key to learning user preferences, avoiding the pitfalls of greedy exploit-only approaches. A human evaluation strongly supports our quantitative findings.

MCML Authors

Lütfi Kerem Senel

* Former Member

[1143]

C. Tomani, D. Vilar, M. Freitag, C. Cherry, S. Naskar, M. Finkelstein, X. Garcia and D. Cremers.
Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model.
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations getting assigned a higher score by the model. However, research has shown that this assumption does not always hold, and generation quality can be improved by decoding to optimize a utility function backed by a metric or quality-estimation signal, as is done by Minimum Bayes Risk (MBR) or Quality-Aware decoding. The main disadvantage of these approaches is that they require an additional model to calculate the utility function during decoding, significantly increasing the computational cost. In this paper, we propose to make the NMT models themselves quality-aware by training them to estimate the quality of their own output. Using this approach for MBR decoding we can drastically reduce the size of the candidate list, resulting in a speed-up of two-orders of magnitude. When applying our method to MAP decoding we obtain quality gains similar or even superior to quality reranking approaches, but with the efficiency of single pass decoding.

MCML Authors

Christian Tomani

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computer Vision & Artificial Intelligence

[1142]

L. Weber-Genzel, S. Peng, M.-C. De Marneffe and B. Plank.
VariErr NLI: Separating Annotation Error from Human Label Variation.
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Human label variation arises when annotators assign different labels to the same item for valid reasons, while annotation errors occur when labels are assigned for invalid reasons. These two issues are prevalent in NLP benchmarks, yet existing research has studied them in isolation. To the best of our knowledge, there exists no prior work that focuses on teasing apart error from signal, especially in cases where signal is beyond black-and-white.To fill this gap, we introduce a systematic methodology and a new dataset, VariErr (variation versus error), focusing on the NLI task in English. We propose a 2-round annotation procedure with annotators explaining each label and subsequently judging the validity of label-explanation pairs.VariErr contains 7,732 validity judgments on 1,933 explanations for 500 re-annotated MNLI items. We assess the effectiveness of various automatic error detection (AED) methods and GPTs in uncovering errors versus human label variation. We find that state-of-the-art AED methods significantly underperform GPTs and humans. While GPT-4 is the best system, it still falls short of human performance. Our methodology is applicable beyond NLI, offering fertile ground for future research on error versus plausible variation, which in turn can yield better and more trustworthy NLP systems.

MCML Authors

Leon Weber-Genzel

Dr.

* Former Member

Siyao Peng

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1141]

S. Xu, S. T.y.s.s, O. Ichim, B. Plank and M. Grabmair.
Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification.
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

In legal decisions, split votes (SV) occur when judges cannot reach a unanimous decision, posing a difficulty for lawyers who must navigate diverse legal arguments and opinions. In high-stakes domains, %as human-AI interaction systems become increasingly important, understanding the alignment of perceived difficulty between humans and AI systems is crucial to build trust. However, existing NLP calibration methods focus on a classifier’s awareness of predictive performance, measured against the human majority class, overlooking inherent human label variation (HLV). This paper explores split votes as naturally observable human disagreement and value pluralism. We collect judges’ vote distributions from the European Court of Human Rights (ECHR), and present SV-ECHR, a case outcome classification (COC) dataset with SV information. We build a taxonomy of disagreement with SV-specific subcategories. We further assess the alignment of perceived difficulty between models and humans, as well as confidence- and human-calibration of COC models. We observe limited alignment with the judge vote distribution. To our knowledge, this is the first systematic exploration of calibration to human judgements in legal NLP. Our study underscores the necessity for further research on measuring and enhancing model calibration considering HLV in legal decision tasks.

MCML Authors

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1140]

L. Christ, S. Amiriparian, M. Milling, I. Aslan and B. W. Schuller.
Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning.
ACL 2024 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children’s stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of .8221 for valence and .7125 for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict.

MCML Authors

Shahin Amiriparian

Dr.

Health Informatics

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Health Informatics

[1139]

K. Hämmerl, J. Libovický and A. Fraser.
Understanding Cross-Lingual Alignment—A Survey.
ACL 2024 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Cross-lingual alignment, the meaningful similarity of representations across languages in multilingual language models, has been an active field of research in recent years. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field. We present different understandings of cross-lingual alignment and their limitations. We provide a qualitative summary of results from a number of surveyed papers. Finally, we discuss how these insights may be applied not only to encoder models, where this topic has been heavily studied, but also to encoder-decoder or even decoder-only models, and argue that an effective trade-off between language-neutral and language-specific information is key.

MCML Authors

Katharina Hämmerl

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

[1138]

W. Lai, M. Mesgar and A. Fraser.
LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback.
ACL 2024 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

To democratize large language models (LLMs) to most natural languages, it is imperative to make these models capable of understanding and generating texts in many languages, in particular low-resource ones. While recent multilingual LLMs demonstrate remarkable performance in such capabilities, these LLMs still support a limited number of human languages due to the lack of training data for low resource languages. Moreover, these LLMs are not yet aligned with human preference for downstream tasks, which is crucial for the success of LLMs in English. In this paper, we introduce xLLaMA-100 and xBLOOM-100 (collectively xLLMs-100), which scale the multilingual capabilities of LLaMA and BLOOM to 100 languages. To do so, we construct two datasets: a multilingual instruction dataset including 100 languages, which represents the largest language coverage to date, and a cross-lingual human feedback dataset encompassing 30 languages. We perform multilingual instruction tuning on the constructed instruction data and further align the LLMs with human feedback using the DPO algorithm on our cross-lingual human feedback dataset. We evaluate the multilingual understanding and generating capabilities of xLLMs-100 on five multilingual benchmarks. Experimental results show that xLLMs-100 consistently outperforms its peers across the benchmarks by considerable margins, defining a new state-of-the-art multilingual LLM that supports 100 languages.

MCML Authors

Wen Lai

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Data Analytics & Statistics

[1137]

A. Maarouf, D. Bär, D. Geißler and S. Feuerriegel.
HQP: A human-annotated dataset for detecting online propaganda.
ACL 2024 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Online propaganda poses a severe threat to the integrity of societies. However, existing datasets for detecting online propaganda have a key limitation: they were annotated using weak labels that can be noisy and even incorrect. To address this limitation, our work makes the following contributions: (1) We present HQP: a novel dataset (N=30000) for detecting online propaganda with high-quality labels. To the best of our knowledge, HQP is the first large-scale dataset for detecting online propaganda that was created through human annotation. (2) We show empirically that state-of-the-art language models fail in detecting online propaganda when trained with weak labels (AUC: 64.03). In contrast, state-of-the-art language models can accurately detect online propaganda when trained with our high-quality labels (AUC: 92.25), which is an improvement of 44%. (3) We show that prompt-based learning using a small sample of high-quality labels can still achieve a reasonable performance (AUC: 80.27) while significantly reducing the cost of labeling. (4) We extend HQP to HQP+ to test how well propaganda across different contexts can be detected. Crucially, our work highlights the importance of high-quality labels for sensitive NLP tasks such as propaganda detection.

MCML Authors

Abdurahman Maarouf

Artificial Intelligence in Management

Dominik Bär

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Artificial Intelligence in Management

[1136]

X. Wang, B. Ma, C. Hu, L. Weber-Genzel, P. Röttger, F. Kreuter, D. Hovy and B. Plank.
My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models.
ACL 2024 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One common evaluation approach uses multiple-choice questions to limit the response space. The model is then evaluated by ranking the candidate answers by the log probability of the first token prediction. However, first-tokens may not consistently reflect the final response output, due to model’s diverse response styles such as starting with ‘Sure’ or refusing to answer. Consequently, first-token evaluation is not indicative of model behaviour when interacting with users. But by how much? We evaluate how aligned first-token evaluation is with the text output along several dimensions, namely final option choice, refusal rate, choice distribution and robustness under prompt perturbation. Our results show that the two approaches are severely misaligned on all dimensions, reaching mismatch rates over 60%. Models heavily fine-tuned on conversational or safety data are especially impacted. Crucially, models remain misaligned even when we increasingly constrain prompts, i.e., force them to start with an option letter or example template. Our findings i) underscore the importance of inspecting the text output as well and ii) caution against relying solely on first-token evaluation.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Leon Weber-Genzel

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

* Former Member

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[1135]

P. Wicke and L. Wachowiak.
Exploring Spatial Schemas in Large Language Models.
ACL 2024 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI GitHub

Abstract

Despite the ubiquity of large language models (LLMs) in AI research, the question of embodiment in LLMs remains underexplored, distinguishing them from embodied systems in robotics where sensory perception directly informs physical action.Our investigation navigates the intriguing terrain of whether LLMs, despite their non-embodied nature, effectively capture implicit human intuitions about fundamental, spatial building blocks of language. We employ insights from spatial cognitive foundations developed through early sensorimotor experiences, guiding our exploration through the reproduction of three psycholinguistic experiments. Surprisingly, correlations between model outputs and human responses emerge, revealing adaptability without a tangible connection to embodied experiences. Notable distinctions include polarized language model responses and reduced correlations in vision language models. This research contributes to a nuanced understanding of the interplay between language, spatial experiences, and the computations made by large language models.

MCML Authors

Philipp Wicke

Dr.

Computational Linguistics

[1134]

S. Yuan, E. Nie, M. Färber, H. Schmid and H. Schütze.
GNNAVI: Navigating the Information Flow in Large Language Models by Graph Neural Network.
ACL 2024 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Large Language Models (LLMs) exhibit strong In-Context Learning (ICL) capabilities when prompts with demonstrations are applied to them. However, fine-tuning still remains crucial to further enhance their adaptability. Prompt-based fine-tuning proves to be an effective fine-tuning method in low-data scenarios, but high demands on computing resources limit its practicality. We address this issue by introducing a prompt-based parameter-efficient fine-tuning (PEFT) approach. GNNavi leverages insights into ICL’s information flow dynamics, which indicates that label words act in prompts as anchors for information propagation. GNNavi employs a Graph Neural Network (GNN) layer to precisely guide the aggregation and distribution of information flow during the processing of prompts by hardwiring the desired information flow into the GNN. Our experiments on text classification tasks with GPT-2 and Llama2 shows GNNavi surpasses standard prompt-based fine-tuning methods in few-shot settings by updating just 0.2% to 0.5% of parameters. We compare GNNavi with prevalent PEFT approaches, such as prefix tuning, LoRA and Adapter in terms of performance and efficiency. Our analysis reveals that GNNavi enhances information flow and ensures a clear aggregation process.

MCML Authors

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1133]

M. Zhang, V. Gautam, M. Wang, J. Alabi, X. Shen, D. Klakow and M. Mosbach.
The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis.
ACL 2024 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

In-context learning is a popular inference strategy where large language models solve a task using only a few labeled demonstrations without needing any parameter updates. Although there have been extensive studies on English in-context learning, multilingual in-context learning remains under-explored, and we lack an in-depth understanding of the role of demonstrations in this context. To address this gap, we conduct a multidimensional analysis of multilingual in-context learning, experimenting with 5 models from different model families, 9 datasets covering classification and generation tasks, and 56 typologically diverse languages. Our results reveal that the effectiveness of demonstrations varies significantly across models, tasks, and languages. We also find that strong instruction-following models including Llama 2-Chat, GPT-3.5, and GPT-4 are largely insensitive to the quality of demonstrations. Instead, a carefully crafted template often eliminates the benefits of demonstrations for some tasks and languages altogether. These findings show that the importance of demonstrations might be overestimated. Our work highlights the need for granular evaluation across multiple axes towards a better understanding of in-context learning.

MCML Authors

Mingyang Wang

Computational Linguistics

[1132]

B. Ma.
Evaluating Lexical Aspect with Large Language Models.
CMCL @ACL 2024 - Workshop on Cognitive Modeling and Computational Linguistics at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

In this study, we explore the proficiency of large language models (LLMs) in understanding two key lexical aspects: duration (durative/stative) and telicity (telic/atelic). Through experiments on datasets featuring sentences, verbs, and verb positions, we prompt the LLMs to identify aspectual features of verbs in sentences. Our findings reveal that certain LLMs, particularly those closed-source ones, are able to capture information on duration and telicity, albeit with some performance variations and weaker results compared to the baseline. By employing prompts at three levels (sentence-only, sentence with verb, and sentence with verb and its position), we demonstrate that integrating verb information generally enhances performance in aspectual feature recognition, though it introduces instability. We call for future research to look deeper into methods aimed at optimizing LLMs for aspectual feature comprehension.

MCML Authors

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[1131]

A. Dimmelmeier, H. Doll, M. Schierholz, E. Kormanyos, M. Fehr, B. Ma, J. Beck, A. Fraser and F. Kreuter.
Informing climate risk analysis using textual information - A research agenda.
ClimateNLP @ACL 2024 - 1st Workshop on Natural Language Processing Meets Climate Change at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

We present a research agenda focused on efficiently extracting, assuring quality, and consolidating textual company sustainability information to address urgent climate change decision-making needs. Starting from the goal to create integrated FAIR (Findable, Accessible, Interoperable, Reusable) climate-related data, we identify research needs pertaining to the technical aspects of information extraction as well as to the design of the integrated sustainability datasets that we seek to compile. Regarding extraction, we leverage technological advancements, particularly in large language models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines, to unlock the underutilized potential of unstructured textual information contained in corporate sustainability reports. In applying these techniques, we review key challenges, which include the retrieval and extraction of CO2 emission values from PDF documents, especially from unstructured tables and graphs therein, and the validation of automatically extracted data through comparisons with human-annotated values. We also review how existing use cases and practices in climate risk analytics relate to choices of what textual information should be extracted and how it could be linked to existing structured data.

MCML Authors

Malte Schierholz

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Jacob Beck

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

Frauke Kreuter

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Social Data Science and AI

[1130]

S. Zhou, S. Peng and B. Plank.
CLIMATELI: Evaluating Entity Linking on Climate Change Data.
ClimateNLP @ACL 2024 - 1st Workshop on Natural Language Processing Meets Climate Change at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Climate Change (CC) is a pressing topic of global importance, attracting increasing attention across research fields, from social sciences to Natural Language Processing (NLP). CC is also discussed in various settings and communication platforms, from academic publications to social media forums. Understanding who and what is mentioned in such data is a first critical step to gaining new insights into CC. We present CLIMATELI (CLIMATe Entity LInking), the first manually annotated CC dataset that links 3,087 entity spans to Wikipedia. Using CLIMATELI (CLIMATe Entity LInking), we evaluate existing entity linking (EL) systems on the CC topic across various genres and propose automated filtering methods for CC entities. We find that the performance of EL models notably lags behind humans at both token and entity levels. Testing within the scope of retaining or excluding non-nominal and/or non-CC entities particularly impacts the models’ performances.

MCML Authors

Shijia Zhou

AI and Computational Linguistics

Siyao Peng

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[1129]

L. Thede, K. Roth, O. J. Hénaff, M. Bethge and Z. Akata.
Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models.
CoLLAs 2024 - 3rd Conference on Lifelong Learning Agents. Pisa, Italy, Aug 11-14, 2024. URL

Abstract

With the advent and recent ubiquity of foundation models, continual learning (CL) has recently shifted from continual training from scratch to the continual adaptation of pretrained models, seeing particular success on rehearsal-free CL benchmarks (RFCL). To achieve this, most proposed methods adapt and restructure parameter-efficient finetuning techniques (PEFT) to suit the continual nature of the problem. Based most often on input-conditional query-mechanisms or regularizations on top of prompt- or adapter-based PEFT, these PEFT-style RFCL (P-RFCL) approaches report peak performances; often convincingly outperforming existing CL techniques. However, on the other end, critical studies have recently highlighted competitive results by training on just the first task or via simple non-parametric baselines. Consequently, questions arise about the relationship between methodological choices in P-RFCL and their reported high benchmark scores. In this work, we tackle these questions to better understand the true drivers behind strong P-RFCL performances, their placement w.r.t. recent first-task adaptation studies, and their relation to preceding CL standards such as EWC or SI. In particular, we show: (1) P-RFCL techniques relying on input-conditional query mechanisms work not because, but rather despite them by collapsing towards standard PEFT shortcut solutions. (2) Indeed, we show how most often, P-RFCL techniques can be matched by a simple and lightweight PEFT baseline. (3) Using this baseline, we identify the implicit bound on tunable parameters when deriving RFCL approaches from PEFT methods as a potential denominator behind P-RFCL efficacy. Finally, we (4) better disentangle continual versus first-task adaptation, and (5) motivate standard RFCL techniques s.a. EWC or SI in light of recent P-RFCL methods.

MCML Authors

Karsten Roth

Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Interpretable and Reliable Machine Learning

[1128]

J. Pavlopoulos, V. Kougia, E. Garces Arias, P. Platanou, S. Shabalin, K. Liagkou, E. Papadatos, H. Essler, J.-B. Camps and F. Fischer.
Challenging Error Correction in Recognised Byzantine Greek.
ML4AL @ACL 2024 - 1st Workshop on Machine Learning for Ancient Languages at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Automatic correction of errors in Handwritten Text Recognition (HTR) output poses persistent challenges yet to be fully resolved. In this study, we introduce a shared task aimed at addressing this challenge, which attracted 271 submissions, yielding only a handful of promising approaches. This paper presents the datasets, the most effective methods, and an experimental analysis in error-correcting HTRed manuscripts and papyri in Byzantine Greek, the language that followed Classical and preceded Modern Greek. By using recognised and transcribed data from seven centuries, the two best-performing methods are compared, one based on a neural encoder-decoder architecture and the other based on engineered linguistic rules. We show that the recognition error rate can be reduced by both, up to 2.5 points at the level of characters and up to 15 at the level of words, while also elucidating their respective strengths and weaknesses.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

[1127]

A. Yüksel, A. Köksal, L. K. Senel, A. Korhonen and H. Schütze.
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish.
SIGTURK @ACL 2024 - 1st Workshop on Natural Language Processing for Turkic Languages at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. Invited talk. arXiv GitHub

Abstract

MCML Authors

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Lütfi Kerem Senel

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1126]

M. Windl, J. Leusmann, A. Schmidt, S. S. Feger and S. Mayer.
Privacy Communication Patterns for Domestic Robots.
SOUPS 2024 - 20th Symposium on Usable Privacy and Security. Philadelphia, PA, USA, Aug 11-13, 2024. URL

Abstract

Future domestic robots will become integral parts of our homes. They will have various sensors that continuously collect data and varying locomotion and interaction capabilities, enabling them to access all rooms and physically manipulate the environment. This raises many privacy concerns. We investigate how such concerns can be mitigated, using all possibilities enabled by the robot’s novel locomotion and interaction abilities. First, we found that privacy concerns increase with advanced locomotion and interaction capabilities through an online survey (N=90). Second, we conducted three focus groups (N=22) to construct 86 patterns to communicate the states of microphones, cameras, and the internet connectivity of domestic robots. Lastly, we conducted a large-scale online survey (N=1720) to understand which patterns perform best regarding trust, privacy, understandability, notification qualities, and user preference. Our final set of communication patterns will guide developers and researchers to ensure a privacy-preserving future with domestic robots.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

* Former Member

[1125]

M. Aßenmacher, A. Stephan, L. Weissweiler, E. Çano, I. Ziegler, M. Härttrich, B. Bischl, B. Roth, C. Heumann and H. Schütze.
Collaborative Development of Modular Open Source Educational Resources for Natural Language Processing.
TeachingNLP @ACL 2024 - 6th Workshop on Teaching NLP at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL

Abstract

In this work, we present a collaboratively and continuously developed open-source educational resource (OSER) for teaching natural language processing at two different universities. We shed light on the principles we followed for the initial design of the course and the rationale for ongoing developments, followed by a reflection on the inter-university collaboration for designing and maintaining teaching material. When reflecting on the latter, we explicitly emphasize the considerations that need to be made when facing heterogeneous groups and when having to accommodate multiple examination regulations within one single course framework. Relying on the fundamental principles of OSER developments as defined by Bothmann et al. (2023) proved to be an important guideline during this process. The final part pertains to open-sourcing our teaching material, coping with the increasing speed of developments in the field, and integrating the course digitally, also addressing conflicting priorities and challenges we are currently facing.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

Leonie Weissweiler

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1124]

P. Wicke, L. Hirlimann and J. M. Cunha.
Using Analogical Reasoning to Prompt LLMs for their Intuitions of Abstract Spatial Schemas.
Analogy-ANGLE @IJCAI 2024 - 1st Workshop on Analogical Abstraction in Cognition, Perception, and Language at the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024). Jeju, Korea, Aug 03-09, 2024. PDF

Abstract

Abstract notions are often comprehended through analogies, wherein there exists correspondence or partial similarity with more concrete concepts. A fundamental aspect of human cognition involves synthesising embodied experiences into spatial schemas, which profoundly influence conceptualisation and underlie language acquisition. Recent studies have demonstrated that Large Language Models (LLMs) exhibit certain spatial intuitions akin to human language. For instance, both humans and LLMs tend to associate ↑ with hope more readily than with warn. However, the nuanced partial similarities between concrete (e.g., ↑) and abstract (e.g., hope) concepts, remain insufficiently explored. Therefore, we propose a novel methodology utilising analogical reasoning to elucidate these associations and examine whether LLMs adjust their associations in response to analogy-prompts. We find that analogy-prompting is slightly increasing agreement with human choices and the answers given by models include valid explanations supported by analogies, even when in disagreement with human results.

MCML Authors

Philipp Wicke

Dr.

Computational Linguistics

Lea Hirlimann

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[1123]

J. Brandt, M. Wever, V. Bengs and E. Hüllermeier.
Best Arm Identification with Retroactively Increased Sampling Budget for More Resource-Efficient HPO.
IJCAI 2024 - 33rd International Joint Conference on Artificial Intelligence. Jeju, Korea, Aug 03-09, 2024. DOI

Abstract

Hyperparameter optimization (HPO) is indispensable for achieving optimal performance in machine learning tasks. A popular class of methods in this regard is based on Successive Halving (SHA), which casts HPO into a pure-exploration multi-armed bandit problem under finite sampling budget constraints. This is accomplished by considering hyperparameter configurations as arms and rewards as the negative validation losses. While enjoying theoretical guarantees as well as working well in practice, SHA comes, however, with several hyperparameters itself, one of which is the maximum budget that can be allocated to evaluate a single arm (hyperparameter configuration). Although there are already solutions to this meta hyperparameter optimization problem, such as the doubling trick or asynchronous extensions of SHA, these are either practically inefficient or lack theoretical guarantees. In this paper, we propose incremental SHA (iSHA), a synchronous extension of SHA, allowing to increase the maximum budget a posteriori while still enjoying theoretical guarantees. Our empirical analysis of HPO problems corroborates our theoretical findings and shows that iSHA is more resource-efficient than existing SHA-based approaches.

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[1122]

J. G. Wiese, L. Wimmer, T. Papamarkou, B. Bischl, S. Günnemann and D. Rügamer.
Towards Efficient Posterior Sampling in Deep Neural Networks via Symmetry Removal (Extended Abstract).
IJCAI 2024 - 33rd International Joint Conference on Artificial Intelligence. Jeju, Korea, Aug 03-09, 2024. DOI

Abstract

Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. Such symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[1121]

L. Bothmann and K. Peters.
Fairness als Qualitätskriterium im Maschinellen Lernen – Rekonstruktion des philosophischen Konzepts und Implikationen für die Nutzung außergesetzlicher Merkmale bei qualifizierten Mietspiegeln.
AStA Wirtschafts- und Sozialstatistisches Archiv 18 (Aug. 2024). DOI

Abstract

With the increased use of machine learning (ML) models within automated decision-making systems, the demands on the quality of ML models are growing. Pure prediction quality is no longer the sole quality criterion; in particular, there is an increasing demand to consider fairness aspects. This paper pursues two goals. First, it summarizes the current fairness discussion in the field of ML (fairML) and describes the most recent developments, especially with respect to the philosophical foundations of the concept of fairness within ML. On the other hand, the question is addressed to what extent so-called ‘extra-legal’ characteristics may be used in the compilation of qualified rent indices. A recent proposal by Kauermann and Windmann (AStA Wirtschafts- und Sozialstatistisches Archiv, Volume 17, 2023) on using extra-legal features in qualified rent indices includes a model-based imputation method, which we contrast with the legal requirements. Finally, we show which alternatives from the field of fairML could be used and outline the different basic philosophical assumptions behind the various methods.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[1120]

D. Schalk, R. Rehms, V. S. Hoffmann, B. Bischl and U. Mansmann.
Distributed non-disclosive validation of predictive models by a modified ROC-GLM.
BMC Medical Research Methodology 24.190 (Aug. 2024). DOI

Abstract

Distributed statistical analyses provide a promising approach for privacy protection when analyzing data distributed over several databases. Instead of directly operating on data, the analyst receives anonymous summary statistics, which are combined into an aggregated result. Further, in discrimination model (prognosis, diagnosis, etc.) development, it is key to evaluate a trained model w.r.t. to its prognostic or predictive performance on new independent data. For binary classification, quantifying discrimination uses the receiver operating characteristics (ROC) and its area under the curve (AUC) as aggregation measure. We are interested to calculate both as well as basic indicators of calibration-in-the-large for a binary classification task using a distributed and privacy-preserving approach…

MCML Authors

Daniel Schalk

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1119]

F. Drost, E. Dorigatti, A. Straub, P. Hilgendorf, K. I. Wagner, K. Heyer, M. López Montes, B. Bischl, D. H. Busch, K. Schober and B. Schubert.
Predicting T cell receptor functionality against mutant epitopes.
Cell Genomics 4.9 (Aug. 2024). DOI

Abstract

Cancer cells and pathogens can evade T cell receptors (TCRs) via mutations in immunogenic epitopes. TCR cross-reactivity (i.e., recognition of multiple epitopes with sequence similarities) can counteract such escape but may cause severe side effects in cell-based immunotherapies through targeting self-antigens. To predict the effect of epitope point mutations on T cell functionality, we here present the random forest-based model Predicting T Cell Epitope-Specific Activation against Mutant Versions (P-TEAM). P-TEAM was trained and tested on three datasets with TCR responses to single-amino-acid mutations of the model epitope SIINFEKL, the tumor neo-epitope VPSVWRSSL, and the human cytomegalovirus antigen NLVPMVATV, totaling 9,690 unique TCR-epitope interactions. P-TEAM was able to accurately classify T cell reactivities and quantitatively predict T cell functionalities for unobserved single-point mutations and unseen TCRs. Overall, P-TEAM provides an effective computational tool to study T cell responses against mutated epitopes.

MCML Authors

Emilio Dorigatti

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[1118]

A. Mittermeier, M. Aßenmacher, B. Schachtner, S. Grosu, V. Dakovic, V. Kandratovich, B. Sabel and M. Ingrisch.
Automatische ICD-10-Codierung.
Die Radiologie 64 (Aug. 2024). DOI

Abstract

Hintergrund: Die medizinische Codierung von radiologischen Befunden ist essenziell für eine gute Qualität der Versorgung und die korrekte Abrechnung, gleichzeitig aber eine aufwändige und fehleranfällige Aufgabe.
Ziel der Arbeit: Bewertung der Anwendbarkeit natürlicher Sprachverarbeitung (Natural Language Processing, NLP) für die ICD-10-Codierung von radiologischen Befunden in deutscher Sprache durch Finetuning geeigneter Sprachmodelle.
Material und Methoden: In dieser retrospektiven Studie wurden alle Magnetresonanztomographie(MRT)-Befunde unseres Instituts zwischen 2010 und 2020 berücksichtigt. Die ICD-10-Codes bei Entlassung wurden den jeweiligen Befunden zugeordnet, um einen Datensatz für eine Multiclass-Klassifizierung zu erstellen. Finetuning von GermanBERT und flanT5 wurde auf dem Gesamtdatensatz (dstotal) mit 1035 verschiedenen ICD-10-Codes und zwei reduzierten Datensätzen mit den 100 (ds100) und 50 (ds50) häufigsten Codes durchgeführt. Die Performance der Modelle wurde mit Top-k-Genauigkeit für k = 1, 3, 5 evaluiert. In einer Ablationsstudie wurden beide Modelle einmal auf den zugehörigen Metadaten und dem Befund allein trainiert.
Ergebnisse: Der Gesamtdatensatz bestand aus 100.672 radiologischen Befunden, die reduzierten Datensätze ds100 aus 68.103 und ds50 aus 52.293 Berichten. Die Modellperformance stieg, wenn mehrere der besten Voraussagen des Modells in Betracht gezogen wurden, die Anzahl der Zielklassen reduziert wurde und die Metadaten mit dem Befund kombiniert wurden. FlanT5 übertraf GermanBERT in allen Datensätzen und Metriken und eignet sich am besten als medizinischer Codierungsassistent, wobei eine Top-3-Genauigkeit von fast 70% im realitätsnahen Datensatz dstotal erreicht wurde.
Schlussfolgerung: Finetuning von Sprachmodellen verspricht eine zuverlässige Vorhersage von ICD-10-Codes deutscher radiologischer MRT-Befunde in unterschiedlichen Szenarien. Als Codierungsassistent kann flanT5 medizinischen Codierern helfen, informierte Entscheidungen zu treffen und potenziell ihre Arbeitsbelastung reduzieren.

MCML Authors

Andreas Mittermeier

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[1117]

Z. Ren, Y. Chang, T. T. Nguyen, Y. Tan, K. Qian and B. W. Schuller.
A Comprehensive Survey on Heart Sound Analysis in the Deep Learning Era.
IEEE Computational Intelligence Magazine 19.3 (Aug. 2024). DOI

Abstract

Heart sound auscultation has been applied in clinical usage for early screening of cardiovascular diseases. Due to the high demand for auscultation expertise, automatic auscultation can help with auxiliary diagnosis and reduce the burden of training professional clinicians. Nevertheless, there is a limit to classic machine learning’s performance improvement in the era of Big Data. Deep learning has outperformed classic machine learning in many research fields, as it employs more complex model architectures with a stronger capability of extracting effective representations. Moreover, it has been successfully applied to heart sound analysis in the past years. As most review works about heart sound analysis were carried out before 2017, the present survey is the first to work on a comprehensive overview to summarise papers on heart sound analysis with deep learning published in 2017–2022. This work introduces both classic machine learning and deep learning for comparison, and further offer insights about the advances and future research directions in deep learning for heart sound analysis.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1116]

J. Naumann, B. Xu, S. Leutenegger and X. Zuo.
NeRF-VO: Real-Time Sparse Visual Odometry With Neural Radiance Fields.
IEEE Robotics and Automation Letters 9.8 (Aug. 2024). DOI

Abstract

We introduce a novel monocular visual odometry (VO) system, NeRF-VO, that integrates learning-based sparse visual odometry for low-latency camera tracking and a neural radiance scene representation for fine-detailed dense reconstruction and novel view synthesis. Our system initializes camera poses using sparse visual odometry and obtains view-dependent dense geometry priors from a monocular prediction network. We harmonize the scale of poses and dense geometry, treating them as supervisory cues to train a neural implicit scene representation. NeRF-VO demonstrates exceptional performance in both photometric and geometric fidelity of the scene representation by jointly optimizing a sliding window of keyframed poses and the underlying dense geometry, which is accomplished through training the radiance field with volume rendering. We surpass SOTA methods in pose estimation accuracy, novel view synthesis fidelity, and dense reconstruction quality across a variety of synthetic and real-world datasets while achieving a higher camera tracking frequency and consuming less GPU memory.

MCML Authors

Stefan Leutenegger

Prof. Dr.

B3 | Multimodal Perception
→ Group Stefan Leutenegger

Machine Learning for Robotics

Xingxing Zuo

Dr.

* Former Member

[1115]

K. Heidler, I. Nitze, G. Grosse and X. Zhu.
PixelDINO: Semi-Supervised Semantic Segmentation for Detecting Permafrost Disturbances in the Arctic.
IEEE Transactions on Geoscience and Remote Sensing 62 (Aug. 2024). DOI

Abstract

Arctic permafrost is facing significant changes due to global climate change. As these regions are largely inaccessible, remote sensing plays a crucial rule in better understanding the underlying processes across the Arctic. In this study, we focus on the remote detection of retrogressive thaw slumps (RTSs), a permafrost disturbance comparable to slow landslides. For such remote sensing tasks, deep learning has become an indispensable tool, but limited labeled training data remains a challenge for training accurate models. We present PixelDINO, a semi-supervised learning approach, to improve model generalization across the Arctic with a limited number of labels. PixelDINO leverages unlabeled data by training the model to define its own segmentation categories (pseudoclasses), promoting consistent structural learning across strong data augmentations. This allows the model to extract structural information from unlabeled data, supplementing the learning from labeled data. PixelDINO surpasses both supervised baselines and existing semi-supervised methods, achieving average intersection-over-union (IoU) of 30.2 and 39.5 on the two evaluation sets, representing significant improvements of 13% and 21%, respectively, over the strongest existing models. This highlights the potential for training robust models that generalize well to regions that were not included in the training data.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1114]

F. Xu, Y. , W. Yang, G.-S. Xia and X. Zhu.
CloudSeg: A multi-modal learning framework for robust land cover mapping under cloudy conditions.
ISPRS Journal of Photogrammetry and Remote Sensing 214 (Aug. 2024). DOI GitHub

Abstract

Cloud coverage poses a significant challenge to optical image interpretation, degrading ground information on Earth’s surface. Synthetic aperture radar (SAR), with its ability to penetrate clouds, provides supplementary information to optical data. However, existing optical-SAR fusion methods predominantly focus on cloud-free scenarios, neglecting the practical challenge of semantic segmentation under cloudy conditions. To tackle this issue, we propose CloudSeg, a novel framework tailored for land cover mapping in the presence of clouds. It addresses the challenges posed by cloud cover from two aspects: reducing semantic ambiguity in areas of the cloudy image that are obscured by clouds and enhancing effective information in the unobstructed portions. Specifically, CloudSeg employs a multi-task learning strategy to simultaneously handle low-level visual task and high-level semantic understanding task, mitigating the semantic ambiguity caused by cloud cover by acquiring discriminative features through an auxiliary cloud removal task. Additionally, CloudSeg incorporates a knowledge distillation strategy, which utilizes the knowledge learned by the teacher network under cloud-free conditions to guide the student network to overcome the interference of cloud-covered areas, enhancing the valuable information from the unobstructed parts of cloud-covered images. Extensive experiments conducted on two datasets, M3M-CR and WHU-OPT-SAR, demonstrate the effectiveness and superiority of the proposed CloudSeg method for land cover mapping under cloudy conditions. Specifically, CloudSeg outperforms the state-of-the-art competitors by 3.16% in terms of mIoU on M3M-CR and by 5.56% on WHU-OPT-SAR, highlighting its substantial advantages for analyzing regions frequently obscured by clouds.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1113]

S. Heid, J. Hanselle, J. Fürnkranz and E. Hüllermeier.
Learning decision catalogues for situated decision making: The case of scoring systems.
International Journal of Approximate Reasoning 171 (Aug. 2024). DOI

Abstract

In this paper, we formalize the problem of learning coherent collections of decision models, which we call decision catalogues, and illustrate it for the case where models are scoring systems. This problem is motivated by the recent rise of algorithmic decision-making and the idea to improve human decision-making through machine learning, in conjunction with the observation that decision models should be situated in terms of their complexity and resource requirements: Instead of constructing a single decision model and using this model in all cases, different models might be appropriate depending on the decision context. Decision catalogues are supposed to support a seamless transition from very simple, resource-efficient to more sophisticated but also more demanding models. We present a general algorithmic framework for inducing such catalogues from training data, which tackles the learning task as a problem of searching the space of candidate catalogues systematically and, to this end, makes use of heuristic search methods. We also present a concrete instantiation of this framework as well as empirical studies for performance evaluation, which, in a nutshell, show that greedy search is an efficient and hard-to-beat strategy for the construction of catalogues of scoring systems.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Artificial Intelligence and Machine Learning

[1112]

C. Geldhauser, M. Herrmann and D. Janßen.
Traveling Phase Interfaces in Viscous Forward–Backward Diffusion Equations.
Journal of Dynamics and Differential Equations (Aug. 2024). DOI

Abstract

The viscous regularization of an ill-posed diffusion equation with bistable nonlinearity predicts a hysteretic behavior of dynamical phase transitions but a complete mathematical understanding of the intricate multiscale evolution is still missing. We shed light on the fine structure of propagating phase boundaries by carefully examining traveling wave solutions in a special case. Assuming a trilinear constitutive relation we characterize all waves that possess a monotone profile and connect the two phases by a single interface of positive width. We further study the two sharp-interface regimes related to either vanishing viscosity or the bilinear limit.

MCML Authors

Carina Geldhauser

Dr.

* Former Member

[1111]

F. Ott, L. Heublein, D. Rügamer, B. Bischl and C. Mutschler.
Fusing structure from motion and simulation-augmented pose regression from optical flow for challenging indoor environments.
Journal of Visual Communication and Image Representation 103 (Aug. 2024). DOI

Abstract

The localization of objects is essential in many applications, such as robotics, virtual and augmented reality, and warehouse logistics. Recent advancements in deep learning have enabled localization using monocular cameras. Traditionally, structure from motion (SfM) techniques predict an object’s absolute position from a point cloud, while absolute pose regression (APR) methods use neural networks to understand the environment semantically. However, both approaches face challenges from environmental factors like motion blur, lighting changes, repetitive patterns, and featureless areas. This study addresses these challenges by incorporating additional information and refining absolute pose estimates with relative pose regression (RPR) methods. RPR also struggles with issues like motion blur. To overcome this, we compute the optical flow between consecutive images using the Lucas–Kanade algorithm and use a small recurrent convolutional network to predict relative poses. Combining absolute and relative poses is difficult due to differences between global and local coordinate systems. Current methods use pose graph optimization (PGO) to align these poses. In this work, we propose recurrent fusion networks to better integrate absolute and relative pose predictions, enhancing the accuracy of absolute pose estimates. We evaluate eight different recurrent units and create a simulation environment to pre-train the APR and RPR networks for improved generalization. Additionally, we record a large dataset of various scenarios in a challenging indoor environment resembling a warehouse with transportation robots. Through hyperparameter searches and experiments, we demonstrate that our recurrent fusion method outperforms PGO in effectiveness.

MCML Authors

Felix Ott

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[1110]

A. Szałata, K. Hrovatin, S. Becker, A. Tejada-Lapuerta, H. Cui, B. Wang and F. J. Theis.
Transformers in single-cell omics: a review and new perspectives.
Nature Methods 21 (Aug. 2024). DOI

Abstract

Recent efforts to construct reference maps of cellular phenotypes have expanded the volume and diversity of single-cell omics data, providing an unprecedented resource for studying cell properties. Despite the availability of rich datasets and their continued growth, current single-cell models are unable to fully capitalize on the information they contain. Transformers have become the architecture of choice for foundation models in other domains owing to their ability to generalize to heterogeneous, large-scale datasets. Thus, the question arises of whether transformers could set off a similar shift in the field of single-cell modeling. Here we first describe the transformer architecture and its single-cell adaptations and then present a comprehensive review of the existing applications of transformers in single-cell analysis and critically discuss their future potential for single-cell biology. By studying limitations and technical challenges, we aim to provide a structured outlook for future research directions at the intersection of machine learning and single-cell biology.

MCML Authors

Sören Becker

Ethics in Systems Design and Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[1109]

R. Klaar, M. Rabe, A. T. Stüber, S. Hering, S. Corradini, C. Eze, S. Marschner, C. Belka, G. Landry, J. Dinkel and C. Kurz.
MRI-based ventilation and perfusion imaging to predict radiation-induced pneumonitis in lung tumor patients at a 0.35T MR-Linac.
Radiotherapy and Oncology (Aug. 2024). DOI

Abstract

Radiation-induced pneumonitis (RP), diagnosed 6–12 weeks after treatment, is a complication of lung tumor radiotherapy. So far, clinical and dosimetric parameters have not been reliable in predicting RP. We propose using non-contrast enhanced magnetic resonance imaging (MRI) based functional parameters acquired over the treatment course for patient stratification for improved follow-up.

MCML Authors

Theresa Stüber

Clinical Data Science in Radiology

[1108]

J. Eudaric, H. Kreibich, A. Camero, K. R. Shahi, S. Martinis and X. Zhu.
A satellite imagery-driven framework for rapid resource allocation in flood scenarios to enhance loss and damage fund effectiveness.
Scientific Reports 14.19290 (Aug. 2024). DOI

Abstract

The impact of climate change and urbanization has increased the risk of flooding. During the UN Climate Change Conference 28 (COP 28), an agreement was reached to establish “The Loss and Damage Fund” to assist low-income countries impacted by climate change. However, allocating the resources required for post-flood reconstruction and reimbursement is challenging due to the limited availability of data and the absence of a comprehensive tool. Here, we propose a novel resource allocation framework based on remote sensing and geospatial data near the flood peak, such as buildings and population. The quantification of resource distribution utilizes an exposure index for each municipality, which interacts with various drivers, including flood hazard drivers, buildings exposure, and population exposure. The proposed framework asses the flood extension using pre- and post-flood Sentinel-1 Synthetic Aperture Radar (SAR) data. To demonstrate the effectiveness of this framework, an analysis was conducted on the flood that occurred in the Thessaly region of Greece in September 2023. The study revealed that the municipality of Palamas has the highest need for resource allocation, with an exposure index rating of 5/8. Any government can use this framework for rapid decision-making and to expedite post-flood recovery.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1107]

A. C. Erdur, D. Rusche, D. Scholz, J. Kiechle, S. Fischer, Ó. Llorián-Salvador, J. A. Buchner, M. Q. Nguyen, L. Etzel, J. Weidner, M.-C. Metz, B. Wiestler, J. A. Schnabel, D. Rückert, S. E. Combs and J. C. Peeken.
Deep learning for autosegmentation for radiotherapy treatment planning: State-of-the-art and novel perspectives.
Strahlentherapie und Onkologie 201 (Aug. 2024). DOI GitHub

Abstract

The rapid development of artificial intelligence (AI) has gained importance, with many tools already entering our daily lives. The medical field of radiation oncology is also subject to this development, with AI entering all steps of the patient journey. In this review article, we summarize contemporary AI techniques and explore the clinical applications of AI-based automated segmentation models in radiotherapy planning, focusing on delineation of organs at risk (OARs), the gross tumor volume (GTV), and the clinical target volume (CTV). Emphasizing the need for precise and individualized plans, we review various commercial and freeware segmentation tools and also state-of-the-art approaches. Through our own findings and based on the literature, we demonstrate improved efficiency and consistency as well as time savings in different clinical scenarios. Despite challenges in clinical implementation such as domain shifts, the potential benefits for personalized treatment planning are substantial. The integration of mathematical tumor growth models and AI-based tumor detection further enhances the possibilities for refining target volumes. As advancements continue, the prospect of one-stop-shop segmentation and radiotherapy planning represents an exciting frontier in radiotherapy, potentially enabling fast treatment with enhanced precision and individualization.

MCML Authors

Daniel Scholz

AI for Image-Guided Diagnosis and Therapy

Johannes Kiechle

Computational Imaging and AI in Medicine

Stefan Fischer

Computational Imaging and AI in Medicine

Jonas Weidner

AI for Image-Guided Diagnosis and Therapy

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[1106]

E. Bergman, M. Feurer, A. Bahram, A. R. Balef, L. Purucker, S. Segel, M. Lindauer, F. Hutter and K. Eggensperger.
AMLTK: A Modular AutoML Toolkit in Python.
The Journal of Open Source Software 9.100 (Aug. 2024). DOI

Abstract

Machine Learning is a core building block in novel data-driven applications. Practitioners face many ambiguous design decisions while developing practical machine learning (ML) solutions. Automated machine learning (AutoML) facilitates the development of machine learning applications by providing efficient methods for optimizing hyperparameters, searching for neural architectures, or constructing whole ML pipelines (Hutter et al., 2019). Thereby, design decisions such as the choice of modelling, pre-processing, and training algorithm are crucial to obtaining well-performing solutions. By automatically obtaining ML solutions, AutoML aims to lower the barrier to leveraging machine learning and reduce the time needed to develop or adapt ML solutions for new domains or data.
Highly performant software packages for automatically building ML pipelines given data, so-called AutoML systems, are available and can be used off-the-shelf. Typically, AutoML systems evaluate ML models sequentially to return a well-performing single best model or multiple models combined into an ensemble. Existing AutoML systems are typically highly engineered monolithic software developed for specific use cases to perform well and robustly under various conditions…

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

[1105]

D. Usynin, M. Knolle and G. Kaissis.
Memorisation in Machine Learning: A Survey of Results.
Transactions on Machine Learning Research (Aug. 2024). URL

Abstract

Quantifying the impact of individual data samples on machine learning models is an open research problem. This is particularly relevant when complex and high-dimensional relationships have to be learned from a limited sample of the data generating distribution, such as in deep learning. It was previously shown that, in these cases, models rely not only on extracting patterns which are helpful for generalisation, but also seem to be required to incorporate some of the training data more or less as is, in a process often termed memorisation. This raises the question: if some memorisation is a requirement for effective learning, what are its privacy implications? In this work we consider a broad range of previous definitions and perspectives on memorisation in ML, discuss their interplay with model generalisation and their implications of these phenomena on data privacy. We then propose a framework to reason over what memorisation means in the context of ML training under the prism of individual sample’s influence on the model. Moreover, we systematise methods allowing practitioners to detect the occurrence of memorisation or quantify it and contextualise our findings in a broad range of ML learning settings. Finally, we discuss memorisation in the context of privacy attacks, differential privacy and adversarial actors.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

[1104]

R. Baptista, B. Liew, S. Pizzocaro, X. Zhai, S. Galasso, D. Rügamer, T. Waterkeyn, I. Boukhennoufa, X. Zhu and A. M. Nunzio.
Motion Analysis in Neurological Rehabilitation: From the Lab to the Clinic.
Translational Neurorehabilitation (Aug. 2024). DOI

Abstract

Human motion analysis and biomechanics are fundamental in a clinical environment, and together, they provide relevant and precise information towards diagnosing numerous neurodegenerative conditions such as stroke, Parkinson’s disease, Alzheimer’s disease, multiple sclerosis, etc. In most neurological disorders, walking is commonly impacted, where performance, quantity, and quality are affected. Thus, motion analysis aims at understanding the cause of altered motion patterns, mainly assisting with the prevention, identification, and rehabilitation. Usually, motion analysis assessment relies on the patient’s self-report and the practitioner’s visually assessed observations. Therefore, such assessments are often subjective and susceptible to human-induced error. In contrast, sophisticated devices can provide quantitative accuracy by equipping practitioners with precise, reliable, and objective measurements to simultaneously monitor an extensive set of parameters for gait analysis (e.g., 3D joint kinematics, muscle activation patterns, muscle forces, and coordination patterns). This book chapter addresses the challenges and describes the technological solutions considered when moving out of the lab condition to the real-world environments, in this case, the clinical setting.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1103]

S. Dutta, T. Kaufmann, G. Glavaš, I. Habernal, K. Kersting, F. Kreuter, M. Mezini, I. Gurevych, E. Hüllermeier and H. Schütze.
Problem Solving Through Human-AI Preference-Based Cooperation.
Preprint (Aug. 2024). arXiv

Abstract

While there is a widespread belief that artificial general intelligence (AGI) – or even superhuman AI – is imminent, complex problems in expert domains are far from being solved. We argue that such problems require human-AI cooperation and that the current state of the art in generative AI is unable to play the role of a reliable partner due to a multitude of shortcomings, including difficulty to keep track of a complex solution artifact (e.g., a software program), limited support for versatile human preference expression and lack of adapting to human preference in an interactive setting. To address these challenges, we propose HAICo2, a novel human-AI co-construction framework. We take first steps towards a formalization of HAICo2 and discuss the difficult open research problems that it faces.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[1102]

D. Schkoda, E. Robeva and M. Drton.
Causal Discovery of Linear Non-Gaussian Causal Models with Unobserved Confounding.
Preprint (Aug. 2024). arXiv

Abstract

We consider linear non-Gaussian structural equation models that involve latent confounding. In this setting, the causal structure is identifiable, but, in general, it is not possible to identify the specific causal effects. Instead, a finite number of different causal effects result in the same observational distribution. Most existing algorithms for identifying these causal effects use overcomplete independent component analysis (ICA), which often suffers from convergence to local optima. Furthermore, the number of latent variables must be known a priori. To address these issues, we propose an algorithm that operates recursively rather than using overcomplete ICA. The algorithm first infers a source, estimates the effect of the source and its latent parents on their descendants, and then eliminates their influence from the data. For both source identification and effect size estimation, we use rank conditions on matrices formed from higher-order cumulants. We prove asymptotic correctness under the mild assumption that locally, the number of latent variables never exceeds the number of observed variables. Simulation studies demonstrate that our method achieves comparable performance to overcomplete ICA even though it does not know the number of latents in advance.

MCML Authors

Mathias Drton

Prof. Dr.

Mathematical Statistics

[1101]

L. Haliburton.
Designing behavior change technologies for workplace wellbeing.
Dissertation 2024. DOI

Abstract

Advances in technology have made humans more productive at work but often at the cost of wellbeing, with issues like sedentary behavior, social isolation, and excessive screen time affecting modern knowledge workers. Despite efforts to introduce healthy interventions, such as standing desks, uptake remains low due to the intention-behavior gap. This thesis explores ways to design technology that encourages healthy behaviors, using passive and active behavior change methods to motivate users, and proposes a design framework for ethical behavior change technologies that promote a healthier, more productive workplace. (Shortened).

MCML Authors

Luke Haliburton

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[1100]

L. Bothmann, K. Peters, S. Dandl, M. Schomaker and B. Bischl.
Causal Fair Machine Learning.
GMDS/IBS-DR - 54. Arbeitstagung der Arbeitsgruppen Statistical Computing, Klassifikation und Datenanalyse in den Biowissenschaften. Günzburg, Germany, Jul 28-31, 2024. PDF

Abstract

A growing body of literature in fairness-aware ML aspires to mitigate machine learning (ML)-related unfairness in automated decision-making (ADM) by defining metrics that measure the fairness of an ML model and by proposing methods that ensure that trained ML models achieve low values in those metrics (see, e.g., Verma & Rubin, 2018, Caton & Haas, 2023). However, the underlying concept of fairness, i.e., the question of what fairness is, is rarely discussed, leaving a considerable gap between centuries of philosophical discussion and the recent adoption of the concept in the ML community. We bridge this gap by formalizing a consistent concept of fairness and translating the philosophical considerations into a formal framework for training and evaluating ML models in ADM systems (Bothmann et al., 2024). We argue why and how causal considerations are necessary when assessing fairness in the presence of protected attributes (PAs) by proposing a fictitious, normatively desired (FiND) world where the PAs have no (direct or indirect) causal effect on the target. In practice, this unknown FiND world must be approximated by a warped world, for which the causal effects of the PAs must be removed from the real-world data. We propose rank-preserving interventional distributions to define an estimand of this FiND world and a warping method for estimation (Bothmann et al., 2023). Evaluation criteria for both the method and the resulting ML model are presented. Experiments on simulated data show that our method effectively identifies the most discriminated individuals and mitigates unfairness. Experiments on real-world data showcase the practical application of our method.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michael Schomaker

Prof. Dr.

Biostatistics

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1099]

L. Burk, J. Zobolas, B. Bischl, A. Bender, M. N. Wright and R. Sonabend.
A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data.
GMDS/IBS-DR - 54. Arbeitstagung der Arbeitsgruppen Statistical Computing, Klassifikation und Datenanalyse in den Biowissenschaften. Günzburg, Germany, Jul 28-31, 2024. PDF

Abstract

This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are often narrow in scope, focusing, for example, on high-dimensional data. Additionally, they may lack appropriate tuning or evaluation procedures, or are qualitative reviews, rather than quantitative comparisons. This comprehensive study aims to fill the gap by neutrally evaluating a broad range of methods and providing generalizable conclusions. We benchmark 18 models, ranging from classical statistical approaches to many common machine learning methods, on 32 publicly available datasets. The benchmark tunes for both a discrimination measure and a proper scoring rule to assess performance in different settings. Evaluating on 8 survival metrics, we assess discrimination, calibration, and overall predictive performance of the tested models. Using discrimination measures, we find that no method significantly outperforms the Cox model. However, (tuned) Accelerated Failure Time models were able to achieve significantly better results with respect to overall predictive performance as measured by the right-censored log-likelihood. Machine learning methods that performed comparably well include Oblique Random Survival Forests under discrimination, and Cox-based likelihood-boosting under overall predictive performance. We conclude that for predictive purposes in the standard survival analysis setting of low-dimensional, right-censored data, the Cox Proportional Hazards model remains a simple and robust method, sufficient for practitioners.

MCML Authors

Lukas Burk

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[1098]

M. Herrmann.
Dimensionality and Distance: Curse or Blessing? Geometrical Aspects of Nearest Neighbor Computation in High-Dimensional Data.
GMDS/IBS-DR - 54. Arbeitstagung der Arbeitsgruppen Statistical Computing, Klassifikation und Datenanalyse in den Biowissenschaften. Günzburg, Germany, Jul 28-31, 2024. PDF

Abstract

When it comes to computation, it is often said that high-dimensional data is particularly challenging, known as the curse of dimensionality. For example, in their seminal work, Beyer et al [1] study the impact of high-dimensional data on nearest neighbor computation. They show that in a wide range of settings, including IID data, the difference between the distance to the nearest neighbor and the distance to the most distant neighbor vanishes as the dimension increases. However, it is arguably often overlooked that they also point out that this result does not hold in certain situations, in particular when the intrinsic dimension of the data is low and/or when the data is distributed in well separable subsets. More generally, it is probably less well known that high dimensionality can make computation easier, to the extent that Kainen [2] even speaks of a blessing of dimensionality. Given these different aspects, a natural question to ask is: when is high dimensionality a curse and when is it not (or even a blessing)? In this talk we approach this question from a geometric point of view. Focusing on the aspect of nearest neighbor (and hence distance) computation, we show that high-dimensional data need not be more challenging than low-dimensional data in many practically relevant situations. In particular, using results from extensive experiments on synthetic and real data, we show that this can be the case for both outlier detection and cluster analysis, and for a range of different data types, including image and functional data [3, 4]. Moreover, based on concepts from manifold learning and topological data analysis, we show that these observations can be explained using a common conceptual foundation.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[1097]

M. Bini, K. Roth, Z. Akata and A. Khoreva.
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL GitHub

Abstract

Parameter-efficient finetuning (PEFT) has become ubiquitous to adapt foundation models to downstream task requirements while retaining their generalization ability. However, the amount of additionally introduced parameters and compute for successful adaptation and hyperparameter searches can explode quickly, especially when deployed at scale to serve numerous individual requests. To ensure effective, parameter-efficient, and hyperparameter-robust adaptation, we propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections. By design, ETHER transformations require a minimal number of parameters, are less likely to deteriorate model performance, and exhibit robustness to hyperparameter and learning rate choices. In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters (∼10-100 times lower than LoRA or OFT) across multiple image synthesis and natural language tasks without exhaustive hyperparameter tuning. Finally, we investigate the recent emphasis on Hyperspherical Energy retention for adaptation and raise questions on its practical utility.

MCML Authors

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1096]

K. Bouchiat, A. Immer, H. Yèche, G. Ratsch and V. Fortuin.
Improving Neural Additive Models with Bayesian Principles.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

[1095]

T. Decker, A. R. Bhattarai, J. Gu, V. Tresp and F. Buettner.
Provably Better Explanations with Optimized Aggregation of Feature Attributions.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models. Despite the numerous techniques available, individual methods often produce inconsistent and unstable results, putting their overall reliability into question. In this work, we aim to systematically improve the quality of feature attributions by combining multiple explanations across distinct methods or their variations. For this purpose, we propose a novel approach to derive optimal convex combinations of feature attributions that yield provable improvements of desired quality criteria such as robustness or faithfulness to the model behavior. Through extensive experiments involving various model architectures and popular feature attribution techniques, we demonstrate that our combination strategy consistently outperforms individual methods and existing baselines.

MCML Authors

Thomas Decker

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1094]

S. Eckman, B. Plank and F. Kreuter.
Position: Insights from Survey Methodology can Improve Training Data.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Whether future AI models are fair, trustworthy, and aligned with the public’s interests rests in part on our ability to collect accurate data about what we want the models to do. However, collecting high-quality data is difficult, and few AI/ML researchers are trained in data collection methods. Recent research in data-centric AI has show that higher quality training data leads to better performing models, making this the right moment to introduce AI/ML researchers to the field of survey methodology, the science of data collection. We summarize insights from the survey methodology literature and discuss how they can improve the quality of training and feedback data. We also suggest collaborative research ideas into how biases in data collection can be mitigated, making models more accurate and human-centric.

MCML Authors

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI

[1093]

D. Frauen, V. Melnychuk and S. Feuerriegel.
Fair Off-Policy Learning from Observational Data.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Algorithmic decision-making in practice must be fair for legal, ethical, and societal reasons. To achieve this, prior research has contributed various approaches that ensure fairness in machine learning predictions, while comparatively little effort has focused on fairness in decision-making, specifically off-policy learning. In this paper, we propose a novel framework for fair off-policy learning: we learn decision rules from observational data under different notions of fairness, where we explicitly assume that observational data were collected under a different – potentially discriminatory – behavioral policy. Importantly, our framework applies to different fairness notions for off-policy learning, where fairness is formalized based on actions or policy values. As our main contribution, we propose a neural network-based framework to learn optimal policies under different fairness notions. We further provide theoretical guarantees in the form of generalization bounds for the finite-sample version of our framework. We demonstrate the effectiveness of our framework through extensive numerical experiments using both simulated and real-world data. Altogether, our work enables algorithmic decision-making in a wide array of practical applications where fairness must be ensured.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1092]

F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and k-Shapley values (k-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Artificial Intelligence and Machine Learning

[1091]

M. Herrmann, F. J. D. Lange, K. Eggensperger, G. Casalicchio, M. Wever, M. Feurer, D. Rügamer, E. Hüllermeier, A.-L. Boulesteix and B. Bischl.
Position: Why We Must Rethink Empirical Research in Machine Learning.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

We warn against a common but incomplete understanding of empirical research in machine learning (ML) that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical ML research is fashioned as confirmatory research while it should rather be considered exploratory.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Marcel Wever

Dr.

* Former Member

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[1090]

G. Kaissis, S. Kolek, B. Balle, J. Hayes and D. Rückert.
Beyond the Calibration Point: Mechanism Comparison in Differential Privacy.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

In differentially private (DP) machine learning, the privacy guarantees of DP mechanisms are often reported and compared on the basis of a single pε,δq-pair. This practice overlooks that DP guarantees can vary substantially even between mechanisms sharing a given pε,δq, and potentially introduces privacy vulnerabilities which can remain undetected. This motivates the need for robust, rigorous methods for comparing DP guarantees in such cases. Here, we introduce the ∆-divergence between mechanisms which quantifies the worst-case excess privacy vulnerability of choosing one mechanism over another in terms of pε,δq, f-DP and in terms of a newly presented Bayesian interpretation. Moreover, as a generalisation of the Blackwell theorem, it is endowed with strong decision-theoretic foundations. Through application examples, we show that our techniques can facilitate informed decision-making and reveal gaps in the current understanding of privacy risks, as current practices in DP-SGD often result in choosing mechanisms with high excess privacy vulnerabilities.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Daniel Rückert

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence in Healthcare and Medicine

[1089]

F. Karl, M. Kemeter, G. Dax and P. Sierak.
Position: Embracing Negative Results in Machine Learning.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Publications proposing novel machine learning methods are often primarily rated by exhibited predictive performance on selected problems. In this position paper we argue that predictive performance alone is not a good indicator for the worth of a publication. Using it as such even fosters problems like inefficiencies of the machine learning research community as a whole and setting wrong incentives for researchers. We therefore put out a call for the publication of “negative” results, which can help alleviate some of these problems and improve the scientific output of the machine learning research community. To substantiate our position, we present the advantages of publishing negative results and provide concrete measures for the community to move towards a paradigm where their publication is normalized.

MCML Authors

Florian Karl

Statistical Learning and Data Science

[1088]

M. Lindauer, F. Karl, A. Klier, J. Moosbauer, A. Tornede, A. C. Mueller, F. Hutter, M. Feurer and B. Bischl.
Position: A Call to Action for a Human-Centered AutoML Paradigm.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) workflows, aiding the research of new ML algorithms, and contributing to the democratization of ML by making it accessible to a broader audience. Over the past decade, commendable achievements in AutoML have primarily focused on optimizing predictive performance. This focused progress, while substantial, raises questions about how well AutoML has met its broader, original goals. In this position paper, we argue that a key to unlocking AutoML’s full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems, including their diverse roles, expectations, and expertise. We envision a more human-centered approach in future AutoML research, promoting the collaborative design of ML systems that tightly integrates the complementary strengths of human expertise and AutoML methodologies.

MCML Authors

Florian Karl

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[1087]

T. Papamarkou, M. Skoularidou, K. Palla, L. Aitchison, J. Arbel, D. Dunson, M. Filippone, V. Fortuin, P. Hennig, J. M. Hernández-Lobato, A. Hubin, A. Immer, T. Karaletsos, M. E. Khan, A. Kristiadi, Y. Li, S. Mandt, C. Nemeth, M. A. Osborne, T. G. J. Rudner, D. Rügamer, Y. W. Teh, M. Welling, A. G. Wilson and R. Zhang.
Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1086]

D. Rügamer, C. Kolb, T. Weber, L. Kook and T. Nagler.
Generalizing orthogonalization for models with non-linearities.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms’ application. It was, for instance, shown that neural networks can deduce racial information solely from a patient’s X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the ‘‘orthogonalization’’ or ‘’normalization’’ of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method’s effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Chris Kolb

Statistical Learning and Data Science

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[1085]

Y. Sale, V. Bengs, M. Caprio and E. Hüllermeier.
Second-Order Uncertainty Quantification: A Distance-Based Approach.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

In the past couple of years, various approaches to representing and quantifying different types of predictive uncertainty in machine learning, notably in the setting of classification, have been proposed on the basis of second-order probability distributions, i.e., predictions in the form of distributions on probability distributions. A completely conclusive solution has not yet been found, however, as shown by recent criticisms of commonly used uncertainty measures associated with second-order distributions, identifying undesirable theoretical properties of these measures. In light of these criticisms, we propose a set of formal criteria that meaningful uncertainty measures for predictive uncertainty based on second-order distributions should obey. Moreover, we provide a general framework for developing uncertainty measures to account for these criteria, and offer an instantiation based on the Wasserstein distance, for which we prove that all criteria are satisfied.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence and Machine Learning

[1084]

J. Schweisthal, D. Frauen, M. Van der Schaar and S. Feuerriegel.
Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Estimating the conditional average treatment effect (CATE) from observational data is relevant for many applications such as personalized medicine. Here, we focus on the widespread setting where the observational data come from multiple environments, such as different hospitals, physicians, or countries. Furthermore, we allow for violations of standard causal assumptions, namely, overlap within the environments and unconfoundedness. To this end, we move away from point identification and focus on partial identification. Specifically, we show that current assumptions from the literature on multiple environments allow us to interpret the environment as an instrumental variable (IV). This allows us to adapt bounds from the IV literature for partial identification of CATE by leveraging treatment assignment mechanisms across environments. Then, we propose different model-agnostic learners (so-called meta-learners) to estimate the bounds that can be used in combination with arbitrary machine learning models. We further demonstrate the effectiveness of our meta-learners across various experiments using both simulated and real-world data. Finally, we discuss the applicability of our meta-learners to partial identification in instrumental variable settings, such as randomized controlled trials with non-compliance.

MCML Authors

Jonas Schweisthal

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1083]

Y. Shen, N. Daheim, B. Cong, P. Nickl, G. M. Marconi, C. Bazan, R. Yokota, I. Gurevych, D. Cremers, M. E. Khan and T. Möllenhoff.
Variational Learning is Effective for Large Deep Networks.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL GitHub

Abstract

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective.

MCML Authors

Yuesong Shen

* Former Member

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Computer Vision & Artificial Intelligence

[1082]

E. Sommer, L. Wimmer, T. Papamarkou, L. Bothmann, B. Bischl and D. Rügamer.
Connecting the Dots: Is Mode Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks?
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks’ parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, we establish practical guidelines for sampling and convergence diagnosis. As a result, we present a Bayesian deep ensemble approach as an effective solution with competitive performance and uncertainty quantification.

MCML Authors

Emanuel Sommer

Statistics, Data Science and Machine Learning

Lisa Wimmer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Statistics, Data Science and Machine Learning

[1081]

D. Tramontano, Y. Kivva, S. Salehkaleybar, M. Drton and N. Kiyavash.
Causal Effect Identification in LiNGAM Models with Latent Confounders.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance. Regularization is key in deep learning, especially when training complex models on relatively small datasets. In order to understand inner workings of neural networks, attribution methods such as Layer-wise Relevance Propagation (LRP) have been extensively studied, particularly for interpreting the relevance of input features. We introduce Challenger, a module that leverages the explainable power of attribution maps in order to manipulate particularly relevant input patterns. Therefore, exposing and subsequently resolving regions of ambiguity towards separating classes on the ground-truth data manifold, an issue that arises particularly when training models on rather small datasets. Our Challenger module increases model performance through building more diverse filters within the network and can be applied to any input data domain. We demonstrate that our approach results in substantially better classification as well as calibration performance on datasets with only a few samples up to datasets with thousands of samples. In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.

MCML Authors

Daniele Tramontano

Mathematical Statistics

Mathias Drton

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Mathematical Statistics

[1080]

P. Foth, L. Gosch, S. Geisler, L. Schwinn and S. Günnemann.
Relaxing Graph Transformers for Adversarial Attacks.
ICML 2024 - Workshop Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators at the 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. PDF

Abstract

Existing studies have shown that Graph Neural Networks (GNNs) are vulnerable to adversarial attacks. Even though Graph Transformers (GTs) surpassed Message-Passing GNNs on several benchmarks, their adversarial robustness properties are unexplored. However, attacking GTs is challenging due to their Positional Encodings (PEs) and special attention mechanisms which can be difficult to differentiate. We overcome these challenges by targeting three representative architectures based on (1) random-walk PEs, (2) pair-wise-shortest-path PEs, and (3) spectral PEs - and propose the first adaptive attacks for GTs. We leverage our attacks to evaluate robustness to (a) structure perturbations on node classification; and (b) node injection attacks for (fake-news) graph classification. Our evaluation reveals that they can be catastrophically fragile and underlines our work’s importance and the necessity for adaptive attacks.

MCML Authors

Lukas Gosch

Data Analytics & Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[1079]

Y. Sun, J. Liu, Z. Wu, Z. Ding, Y. Ma, T. Seidl and V. Tresp.
SA-DQAS: Self-attention Enhanced Differentiable Quantum Architecture Search.
ICML 2024 - Workshop Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators at the 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. PDF

Abstract

We introduce SA-DQAS in this paper, a novel framework that enhances the gradient-based Differentiable Quantum Architecture Search (DQAS) with a self-attention mechanism, aimed at optimizing circuit design for Quantum Machine Learning (QML) challenges. Analogous to a sequence of words in a sentence, a quantum circuit can be viewed as a sequence of placeholders containing quantum gates. Unlike DQAS, each placeholder is independent, while the self-attention mechanism in SA-DQAS helps to capture relation and dependency information among each operation candidate placed on placeholders in a circuit. To evaluate and verify, we conduct experiments on job-shop scheduling problems (JSSP), Max-cut problems, and quantum fidelity. Incorporating self-attention improves the stability and performance of the resulting quantum circuits and refines their structural design with higher noise resilience and fidelity. Our research demonstrates the first successful integration of self-attention with DQAS.

MCML Authors

Yize Sun

Database Systems and Data Mining

Zifeng Ding

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Database Systems and Data Mining

[1078]

U. Fischer Abaigar, C. Kern and F. Kreuter.
The Missing Link: Allocation Performance in Causal Machine Learning.
ICML 2024 - Workshop Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact at the 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. arXiv URL

Abstract

Automated decision-making (ADM) systems are being deployed across a diverse range of critical problem areas such as social welfare and healthcare. Recent work highlights the importance of causal ML models in ADM systems, but implementing them in complex social environments poses significant challenges. Research on how these challenges impact the performance in specific downstream decision-making tasks is limited. Addressing this gap, we make use of a comprehensive real-world dataset of jobseekers to illustrate how the performance of a single CATE model can vary significantly across different decision-making scenarios and highlight the differential influence of challenges such as distribution shifts on predictions and allocations.

MCML Authors

Unai Fischer Abaigar

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[1077]

X. Feng, Z. Jiang, T. Kaufmann, E. Hüllermeier, P. Weng and Y. Zhu.
Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries.
MHFAIA @ICML 2024 - Workshop on Models of Human Feedback for AI Alignment at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains with hard-to-formalize objectives. Traditional methods with pairwise trajectory comparisons face challenges: trajectories with subtle differences are hard to compare, and comparisons are ordinal, limiting direct inference of preference strength. In this paper, we introduce the distinguishability query, where humans compare two pairs of trajectories and indicate which pair is easier to compare and then give preference feedback on the easier pair. This type of query directly infers preference strength and is expected to reduce cognitive load on the labeler. We also connect this query to cardinal utility and difference relations, and develop an efficient query selection scheme to achieve better trade-off between query informativeness and easiness. Experimental results empirically demonstrates the potential of our method for faster, data-efficient learning and improved user-friendliness on RLHF benchmarks.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1076]

P. Hofman, Y. Sale and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty: A Credal Approach.
SPIGM @ICML 2024 - Workshop on Structured Probabilistic Inference & Generative Modeling at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Uncertainty representation and quantification are paramount in machine learning, especially in safety-critical applications. In this paper, we propose a novel framework for the quantification of aleatoric and epistemic uncertainty based on the notion of credal sets, i.e., sets of probability distributions. Thus, we assume a learner that produces (second-order) predictions in the form of sets of probability distributions on outcomes. Practically, such an approach can be realized by means of ensemble learning: Given an ensemble of learners, credal sets are generated by including sufficiently plausible predictors, where plausibility is measured in terms of (relative) likelihood. We provide a formal justification for the framework and introduce new measures of epistemic and aleatoric uncertainty as concrete instantiations. We evaluate these measures both theoretically, by analysing desirable axiomatic properties, and empirically, by comparing them in terms of performance and effectiveness to existing measures of uncertainty in an experimental study.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1075]

T. Uscidda, L. Eyring, K. Roth, F. J. Theis, Z. Akata and M. Cuturi.
Disentangled Representation Learning through Geometry Preservation with the Gromov-Monge Gap.
SPIGM @ICML 2024 - Workshop on Structured Probabilistic Inference & Generative Modeling at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. arXiv

Abstract

MCML Authors

Luca Eyring

Interpretable and Reliable Machine Learning

Karsten Roth

Interpretable and Reliable Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Zeynep Akata

Prof. Dr.

B3 | Multimodal Perception
→ Group Matthias Althoff

Interpretable and Reliable Machine Learning

[1074]

H. Krasowski.
Guaranteeing Complex Safety Specifications for Autonomous Vehicles via Reinforcement Learning with Formal Methods.
Dissertation 2024. URL

Abstract

Reinforcement learning (RL) solves complicated motion planning tasks for autonomous vehicles. Current RL methods lack safety guarantees. This dissertation combines RL with formal methods that verify safety specifications so that only verified actions are executed. The safe RL approaches are developed for autonomous vehicles and their complex safety specifications. The evaluation confirms the safety guarantees and real-time capability.

MCML Authors

Hanna Krasowski

Dr.

* Former Member

[1073]

L. Arrighi, L. Pennella, G. M. Tavares and S. Barbon Junior.
Decision Predicate Graphs: Enhancing Interpretability in Tree Ensembles.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

Understanding the decisions of tree-based ensembles and their relationships is pivotal for machine learning model interpretation. Recent attempts to mitigate the human-in-the-loop interpretation challenge have explored the extraction of the decision structure underlying the model taking advantage of graph simplification and path emphasis. However, while these efforts enhance the visualisation experience, they may either result in a visually complex representation or compromise the interpretability of the original ensemble model. In addressing this challenge, especially in complex scenarios, we introduce the Decision Predicate Graph (DPG) as a model-specific tool to provide a global interpretation of the model. DPG is a graph structure that captures the tree-based ensemble model and learned dataset details, preserving the relations among features, logical decisions, and predictions towards emphasising insightful points. Leveraging well-known graph theory concepts, such as the notions of centrality and community, DPG offers additional quantitative insights into the model, complementing visualisation techniques, expanding the problem space descriptions, and offering diverse possibilities for extensions. Empirical experiments demonstrate the potential of DPG in addressing traditional benchmarks and complex classification scenarios.

MCML Authors

Gabriel Marques Tavares

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[1072]

S. Dandl, K. Blesch, T. Freiesleben, G. König, J. Kapar, B. Bischl and M. N. Wright.
CountARFactuals – Generating plausible model-agnostic counterfactual explanations with adversarial random forests.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

Counterfactual explanations elucidate algorithmic decisions by pointing to scenarios that would have led to an alternative, desired outcome. Giving insight into the model’s behavior, they hint users towards possible actions and give grounds for contesting decisions. As a crucial factor in achieving these goals, counterfactuals must be plausible, i.e., describing realistic alternative scenarios within the data manifold. This paper leverages a recently developed generative modeling technique – adversarial random forests (ARFs) – to efficiently generate plausible counterfactuals in a model-agnostic way. ARFs can serve as a plausibility measure or directly generate counterfactual explanations. Our ARF-based approach surpasses the limitations of existing methods that aim to generate plausible counterfactual explanations: It is easy to train and computationally highly efficient, handles continuous and categorical data naturally, and allows integrating additional desiderata such as sparsity in a straightforward manner.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1071]

F. K. Ewald, L. Bothmann, M. N. Wright, B. Bischl, G. Casalicchio and G. König.
A Guide to Feature Importance Methods for Scientific Inference.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

While machine learning (ML) models are increasingly used due to their high predictive power, their use in understanding the data-generating process (DGP) is limited. Understanding the DGP requires insights into feature-target associations, which many ML models cannot directly provide due to their opaque internal mechanisms. Feature importance (FI) methods provide useful insights into the DGP under certain conditions. Since the results of different FI methods have different interpretations, selecting the correct FI method for a concrete use case is crucial and still requires expert knowledge. This paper serves as a comprehensive guide to help understand the different interpretations of global FI methods. Through an extensive review of FI methods and providing new proofs regarding their interpretation, we facilitate a thorough understanding of these methods and formulate concrete recommendations for scientific inference. We conclude by discussing options for FI uncertainty estimation and point to directions for future research aiming at full statistical inference from black-box ML models.

MCML Authors

Fiona Katharina Ewald

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[1070]

P. Kolpaczki, G. Haselbeck and E. Hüllermeier.
How Much Can Stratification Improve the Approximation of Shapley Values?
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

Over the last decade, the Shapley value has become one of the most widely applied tools to provide post-hoc explanations for black box models. However, its theoretically justified solution to the problem of dividing a collective benefit to the members of a group, such as features or data points, comes at a price. Without strong assumptions, the exponential number of member subsets excludes an exact calculation of the Shapley value. In search for a remedy, recent works have demonstrated the efficacy of approximations based on sampling with stratification, in which the sample space is partitioned into smaller subpopulations. The effectiveness of this technique mainly depends on the degree to which the allocation of available samples over the formed strata mirrors their unknown variances. To uncover the hypothetical potential of stratification, we investigate the gap in approximation quality caused by the lack of knowledge of the optimal allocation. Moreover, we combine recent advances to propose two state-of-the-art algorithms Adaptive SVARM and Continuous Adaptive SVARM that adjust the sample allocation on-the-fly. The potential of our approach is assessed in an empirical evaluation.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Matthias Feurer

Artificial Intelligence and Machine Learning

[1069]

D. Rundel, J. Kobialka, C. von Crailsheim, M. Feurer, T. Nagler and D. Rügamer.
Interpretable Machine Learning for TabPFN.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI GitHub

Abstract

The recently developed Prior-Data Fitted Networks (PFNs) have shown very promising results for applications in low-data regimes. The TabPFN model, a special case of PFNs for tabular data, is able to achieve state-of-the-art performance on a variety of classification tasks while producing posterior predictive distributions in mere seconds by in-context learning without the need for learning parameters or hyperparameter tuning. This makes TabPFN a very attractive option for a wide range of domain applications. However, a major drawback of the method is its lack of interpretability. Therefore, we propose several adaptations of popular interpretability methods that we specifically design for TabPFN. By taking advantage of the unique properties of the model, our adaptations allow for more efficient computations than existing implementations. In particular, we show how in-context learning facilitates the estimation of Shapley values by avoiding approximate retraining and enables the use of Leave-One-Covariate-Out (LOCO) even when working with large-scale Transformers. In addition, we demonstrate how data valuation methods can be used to address scalability challenges of TabPFN.

MCML Authors

David Rundel

Statistical Learning and Data Science

Julius Kobialka

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistics, Data Science and Machine Learning

[1068]

C. A. Scholbeck, H. Funk and G. Casalicchio.
Algorithm-Agnostic Feature Attributions for Clustering.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

Understanding how assignments of instances to clusters can be attributed to the features can be vital in many applications. However, research to provide such feature attributions has been limited. Clustering algorithms with built-in explanations are scarce. Common algorithm-agnostic approaches involve dimension reduction and subsequent visualization, which transforms the original features used to cluster the data; or training a supervised learning classifier on the found cluster labels, which adds additional and intractable complexity. We present FACT (feature attributions for clustering), an algorithm-agnostic framework that preserves the integrity of the data and does not introduce additional models. As the defining characteristic of FACT, we introduce a set of work stages: sampling, intervention, reassignment, and aggregation. Furthermore, we propose two novel FACT methods: SMART (scoring metric after permutation) measures changes in cluster assignments by custom scoring functions after permuting selected features; IDEA (isolated effect on assignment) indicates local and global changes in cluster assignments after making uniform changes to selected features.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1067]

S. Dandl, M. Becker, B. Bischl, G. Casalicchio and L. Bothmann.
mlr3summary: Concise and interpretable summaries for machine learning models.
xAI 2024 - Demo Track of the 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. arXiv

Abstract

This work introduces a novel R package for concise, informative summaries of machine learning models. We take inspiration from the summary function for (generalized) linear models in R, but extend it in several directions: First, our summary function is model-agnostic and provides a unified summary output also for non-parametric machine learning models; Second, the summary output is more extensive and customizable – it comprises information on the dataset, model performance, model complexity, model’s estimated feature importances, feature effects, and fairness metrics; Third, models are evaluated based on resampling strategies for unbiased estimates of model performances, feature importances, etc. Overall, the clear, structured output should help to enhance and expedite the model selection process, making it a helpful tool for practitioners and researchers alike.

MCML Authors

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1066]

C. Damke and E. Hüllermeier.
Linear Opinion Pooling for Uncertainty Quantification on Graphs.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL GitHub

Abstract

We address the problem of uncertainty quantification for graph-structured data, or, more specifically, the problem to quantify the predictive uncertainty in (semi-supervised) node classification. Key questions in this regard concern the distinction between two different types of uncertainty, aleatoric and epistemic, and how to support uncertainty quantification by leveraging the structural information provided by the graph topology. Challenging assumptions and postulates of state-of-the-art methods, we propose a novel approach that represents (epistemic) uncertainty in terms of mixtures of Dirichlet distributions and refers to the established principle of linear opinion pooling for propagating information between neighbored nodes in the graph. The effectiveness of this approach is demonstrated in a series of experiments on a variety of graph-structured datasets.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[1065]

L. Kook, C. Kolb, P. Schiele, D. Dold, M. Arpogaus, C. Fritz, P. Baumann, P. Kopper, T. Pielok, E. Dorigatti and D. Rügamer.
How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL

Abstract

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.

MCML Authors

Chris Kolb

Statistical Learning and Data Science

Tobias Pielok

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[1064]

Y. Sale, P. Hofman, T. Löhr, L. Wimmer, T. Nagler and E. Hüllermeier.
Label-wise Aleatoric and Epistemic Uncertainty Quantification.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL

Abstract

We present a novel approach to uncertainty quantification in classification tasks based on label-wise decomposition of uncertainty measures. This label-wise perspective allows uncertainty to be quantified at the individual class level, thereby improving cost-sensitive decision-making and helping understand the sources of uncertainty. Furthermore, it allows to define total, aleatoric, and epistemic uncertainty on the basis of non-categorical measures such as variance, going beyond common entropy-based measures. In particular, variance-based measures address some of the limitations associated with established methods that have recently been discussed in the literature. We show that our proposed measures adhere to a number of desirable properties. Through empirical evaluation on a variety of benchmark data sets – including applications in the medical domain where accurate uncertainty quantification is crucial – we establish the effectiveness of label-wise uncertainty quantification.

MCML Authors

Paul Hofman

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Lisa Wimmer

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1063]

A. Kathan, S. Amiriparian, A. Triantafyllopoulos, A. Gebhard, S. Milkus, J. Hohmann, P. Muderlak, J. Schottdorf, R. Musil and B. W. Schuller.
Personalised Speech-Based PTSD Prediction Using Weighted-Instance Learning.
EMBC 2024 - 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Orlando, FL, USA, Jul 15-19, 2024. DOI

Abstract

Post-traumatic stress disorder (PTSD) is a prevalent disorder that can develop in people who have experienced very stressful, shocking, or distressing events. It has great influence on peoples’ daily life and can affect their mental, physical, or social wellbeing, which is why a timely and professional treatment is required. In this paper, we propose a personalised speech-based PTSD prediction approach using a newly collected dataset which consists of 15 participants, including speech recordings from people with PTSD and healthy controls. In addition, the dataset includes data before and after a clinical intervention so that the prediction can be analysed at different points in time. In our experiments, we demonstrate the superiority of the personalised approach, achieving a best area under the ROC curve (AUC) of 82% and a best relative improvement of 7% points compared to the non-personalised model.

MCML Authors

Alexander Kathan

Health Informatics

Shahin Amiriparian

Dr.

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Alexander Gebhard

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1062]

S. T. Rajamani, K. Rajamani, A. J, K. R and B. W. Schuller.
CBAM_SAUNet: A novel attention U-Net for effective segmentation of corner cases.
EMBC 2024 - 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Orlando, FL, USA, Jul 15-19, 2024. DOI

Abstract

U-Net has been demonstrated to be effective for the task of medical image segmentation. Additionally, integrating attention mechanism into U-Net has been shown to yield significant benefits. The Shape Attentive U-Net (SAUNet) is one such recently proposed attention U-Net that also focuses on interpretability. Furthermore, recent research has focused on identification and reporting of corner cases in segmentation to accelerate the utilisation of deep learning models in clinical practise. However, achieving good model performance on such corner cases is a less-explored research area. In this paper, we propose CBAM_SAUNet which enhances the dual attention decoder block of SAUNet to improve its performance on corner cases. We achieve this by utilising a novel variant of the Convolutional Block Attention Module (CBAM)’s channel attention in the decoder block of SAUNet. We demonstrate the effectiveness of CBAM_SAUNet in the Automated Cardiac Diagnosis Challenge (ACDC) cardiac MRI segmentation challenge. Our proposed novel approach results in improvement in the Dice scores of 12% for Left Ventricle (LV) as well as Right Ventricle (RV) segmentation and 8% for Myocardium (MYO) for the identified corner-case dataset.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1061]

A. Spiesberger, A. Mallol-Ragolta, A. Triantafyllopoulos and B. W. Schuller.
Towards Predicting Menstrual Cycle Phases Exploiting Paralinguistic Features.
EMBC 2024 - 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Orlando, FL, USA, Jul 15-19, 2024. DOI

Abstract

As a growing number of people focus on understanding their bodies, the menstrual cycle and its impact on reproduction are gaining attention. Several studies have shown that the voice changes during the menstrual cycle. However, existing research primarily employs comparative analysis to detect these differences. This paper proposes using machine learning methods to analyse paralinguistic features extracted from women’s voices for predicting menstrual cycle phases. We leverage available data recorded during the menstrual and late follicular phases of 44 naturally cycling women. Using eight paralinguistic features, we achieve an accuracy of 60%, showcasing the feasibility of classifying those two phases using speech signals. We discuss implications and suggest future research avenues, such as the need to use personalised approaches.

MCML Authors

Anika Spiesberger

Health Informatics

Adria Mallol-Ragolta

Health Informatics

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Health Informatics

[1060]

J. Piller, H. Küchenhoff and A. Bender.
Flexible additive models for multi-event survival analysis.
IWSM 2024 - 38th International Workshop on Statistical Modelling. Durham, UK, Jul 14-19, 2024. PDF

Abstract

Piecewise Exponential Additive Mixed Models (PAMMs) (Bender et al., 2018) have gained popularity in various domains due to their ability to tackle a wide variety of survival problems and their flexibility to model non-linear covariate effects, including time-varying effects and cumulative effects (Bender et al., 2019). One advantage of such reduction techniques is that they do not require any specialised software for the estimation of the model parameters. Thus, in the case of the PAMM, they can be conveniently estimated using generalized additive mixed modeling methodology or, for example, respective boosting or deep learning based approaches (Bender et al., 2022). Nevertheless, their use in practice requires pre-processing, which differs depending on the survival task at hand (e.g. left-truncation, competing risks, etc.) and post-processing (e.g. transforming estimated parameters to useful quantities like survival or transition probabilities). The R package pammtools facilitates the entire modeling process, so far, however, only for single-event data. Here we extend the framework and package capabilities to handle general multi-state models.

MCML Authors

Johannes Piller

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Consulting Unit (StaBLab)

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[1059]

T. Löhr, M. Ingrisch and E. Hüllermeier.
Towards Aleatoric and Epistemic Uncertainty in Medical Image Classification.
AIME 2024 - 22nd International Conference on Artificial Intelligence in Medicine. Salt Lake City, UT, USA, Jul 09-12, 2024. DOI

Abstract

Medical domain applications require a detailed understanding of the decision making process, in particular when data-driven modeling via machine learning is involved, and quantifying uncertainty in the process adds trust and interpretability to predictive models. However, current uncertainty measures in medical imaging are mostly monolithic and do not distinguish between different sources and types of uncertainty. In this paper, we advocate the distinction between so-called aleatoric and epistemic uncertainty in the medical domain and illustrate its potential in clinical decision making for the case of PET/CT image classification.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[1058]

S. Dandl, M. Becker, B. Bischl, G. Casalicchio and L. Bothmann.
mlr3summary: Concise and interpretable summaries for machine learning models.
useR! 2024 - International R User Conference. Salzburg, Austria, Jul 08-22, 2024. arXiv GitHub

Abstract

MCML Authors

Susanne Dandl

Dr.

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1057]

S. Fischer and M. Binder.
mlr3torch - Deep Learning in R.
useR! 2024 - International R User Conference. Salzburg, Austria, Jul 08-22, 2024. GitHub

Abstract

mlr3torch is a deep learning framework for the mlr3 ecosystem built on top of torch. It allows to easily build, train and evaluate deep learning models in a few lines of codes, without needing to worry about low-level details. Off-the-shelf learners are readily available, but custom architectures can be defined by connecting PipeOpTorch operators in an mlr3pipelines::Graph.

MCML Authors

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[1056]

H. Boche, A. Fono and G. Kutyniok.
A Mathematical Framework for Computability Aspects of Algorithmic Transparency.
ISIT 2024 - IEEE International Symposium on Information Theory. Athens, Greece, Jul 07-12, 2024. DOI

Abstract

The lack of trustworthiness is a major downside of deep learning. To mitigate the associated risks clear obligations of deep learning models have been proposed via regulatory guidelines. Therefore, a crucial question is to what extent trustworthy deep learning can be realized. Establishing trust-worthiness requires that the factors influencing an algorithmic computation can be retraced, i.e., the algorithmic implementation is transparent. Motivated by the observation that the current evolution of deep learning models necessitates a change in computing technology, we derive a mathematical framework that enables us to analyze whether a transparent implementation in a given computing model is feasible. We exemplarily apply our trustworthiness framework to analyze deep learning approaches for inverse problems in digital and analog computing models represented by Turing and Blum-Shub-Smale Machines, respectively. Based on previous results, we find that Blum-Shub-Smale Machines have the potential to establish trustworthy solvers for inverse problems under fairly general conditions, whereas, Turing machines cannot guarantee trustworthiness to the same degree. For a longer version of this paper with more details and proofs, we refer to [1].

MCML Authors

Adalbert Fono

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[1055]

M. Keicher, K. Zaripova, T. Czempiel, K. Mach, A. Khakzar and N. Navab.
FlexR: Few-shot Classification with Language Embeddings for Structured Reporting of Chest X-rays.
MIDL 2024 - Medical Imaging with Deep Learning. Paris, France, Jul 03-05, 2024. URL

Abstract

The automation of chest X-ray reporting has garnered significant interest due to the time-consuming nature of the task. However, the clinical accuracy of free-text reports has proven challenging to quantify using natural language processing metrics, given the complexity of medical information, the variety of writing styles, and the potential for typos and inconsistencies. Structured reporting and standardized reports, on the other hand, can provide consistency and formalize the evaluation of clinical correctness. However, high-quality annotations for structured reporting are scarce. Therefore, we propose a method to predict clinical findings defined by sentences in structured reporting templates, which can be used to fill such templates. The approach involves training a contrastive language-image model using chest X-rays and related free-text radiological reports, then creating textual prompts for each structured finding and optimizing a classifier to predict clinical findings in the medical image. Results show that even with limited image-level annotations for training, the method can accomplish the structured reporting tasks of severity assessment of cardiomegaly and localizing pathologies in chest X-rays.

MCML Authors

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Kamilia Zaripova

Computer Aided Medical Procedures & Augmented Reality

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1054]

Y. Li, T. Wolf, S. Pölsterl, I. Yakushev, D. M. Hedderich and C. Wachinger.
From Barlow Twins to Triplet Training: Differentiating Dementia with Limited Data.
MIDL 2024 - Medical Imaging with Deep Learning. Paris, France, Jul 03-05, 2024. URL

Abstract

Differential diagnosis of dementia is challenging due to overlapping symptoms, with structural magnetic resonance imaging (MRI) being the primary method for diagnosis. Despite the clinical value of computer-aided differential diagnosis, research has been limited, mainly due to the absence of public datasets that contain diverse types of dementia. This leaves researchers with small in-house datasets that are insufficient for training deep neural networks (DNNs). Self-supervised learning shows promise for utilizing unlabeled MRI scans in training, but small batch sizes for volumetric brain scans make its application challenging. To address these issues, we propose Triplet Training for differential diagnosis with limited target data. It consists of three key stages: (i) self-supervised pre-training on unlabeled data with Barlow Twins, (ii) self-distillation on task-related data, and (iii) fine-tuning on the target dataset. Our approach significantly outperforms traditional training strategies, achieving a balanced accuracy of 75.6%. We further provide insights into the training process by visualizing changes in the latent space after each step. Finally, we validate the robustness of Triplet Training in terms of its individual components in a comprehensive ablation study.

MCML Authors

Yitong Li

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1053]

A. Javanmardi, O. K. Aimiyekagbon, A. Bender, J. K. Kimotho, W. Sextro and E. Hüllermeier.
Remaining Useful Lifetime Estimation of Bearings Operating under Time-Varying Conditions.
PHME 2024 - 8th European Conference of the Prognostics and Health Management Society 2024. Prague, Czech Republic, Jul 03-05, 2024. DOI

Abstract

This paper investigates the remaining useful lifetime (RUL) estimation of bearings under dynamic, i.e., time-varying, operating conditions (OC). Unlike conventional studies that assume constant OC in bearing accelerated life tests, we introduce a dataset with time-varying OC during run-to-failure experiments, simulating real-world scenarios. We explore data-driven approaches to identify the transition point from a healthy to an unhealthy state and estimate the RUL. Additionally, we examine strategies for integrating OC information to enhance RUL estimations. These methodologies are evaluated through numerical experiments using various machine learning algorithms.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Holger Rauhut

Artificial Intelligence and Machine Learning

[1052]

G. M. Nguegnang, H. Rauhut and U. Terstiege.
Convergence of gradient descent for learning linear neural networks.
Advances in Continuous and Discrete Models 2024.23 (Jul. 2024). DOI

Abstract

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the stepsizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers, we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

MCML Authors

Gabin Maxime Nguegnang

Mathematical Data Science and Artificial Intelligence

Holger Rauhut

Prof. Dr.

A2 | Mathematical Foundations
→ Group Holger Rauhut

Mathematical Data Science and Artificial Intelligence

Ulrich Terstiege

Dr.

Mathematical Data Science and Artificial Intelligence

[1051]

M. M. Mandl, A. S. Becker-Pennrich, L. C. Hinske, S. Hoffmann and A.-L. Boulesteix.
Addressing researcher degrees of freedom through minP adjustment.
BMC Medical Research Methodology 24.152 (Jul. 2024). DOI

Abstract

When different researchers study the same research question using the same dataset they may obtain different and potentially even conflicting results. This is because there is often substantial flexibility in researchers’ analytical choices, an issue also referred to as ‘‘researcher degrees of freedom’’. Combined with selective reporting of the smallest p-value or largest effect, researcher degrees of freedom may lead to an increased rate of false positive and overoptimistic results. In this paper, we address this issue by formalizing the multiplicity of analysis strategies as a multiple testing problem. As the test statistics of different analysis strategies are usually highly dependent, a naive approach such as the Bonferroni correction is inappropriate because it leads to an unacceptable loss of power. Instead, we propose using the ‘‘minP’’ adjustment method, which takes potential test dependencies into account and approximates the underlying null distribution of the minimal p-value through a permutation-based procedure. This procedure is known to achieve more power than simpler approaches while ensuring a weak control of the family-wise error rate. We illustrate our approach for addressing researcher degrees of freedom by applying it to a study on the impact of perioperative paO2 on post-operative complications after neurosurgery. A total of 48 analysis strategies are considered and adjusted using the minP procedure. This approach allows to selectively report the result of the analysis strategy yielding the most convincing evidence, while controlling the type 1 error – and thus the risk of publishing false positive results that may not be replicable.

MCML Authors

Maximilian Mandl

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[1050]

M. Ali, M. Kuijs, S. Hediyeh-zadeh, T. Treis, K. Hrovatin, G. Palla, A. C. Schaar and F. J. Theis.
GraphCompass: spatial metrics for differential analyses of cell organization across conditions.
Bioinformatics 40.Supplement 1 (Jul. 2024). DOI

Abstract

Spatial omics technologies are increasingly leveraged to characterize how disease disrupts tissue organization and cellular niches. While multiple methods to analyze spatial variation within a sample have been published, statistical and computational approaches to compare cell spatial organization across samples or conditions are mostly lacking. We present GraphCompass, a comprehensive set of omics-adapted graph analysis methods to quantitatively evaluate and compare the spatial arrangement of cells in samples representing diverse biological conditions. GraphCompass builds upon the Squidpy spatial omics toolbox and encompasses various statistical approaches to perform cross-condition analyses at the level of individual cell types, niches, and samples. Additionally, GraphCompass provides custom visualization functions that enable effective communication of results. We demonstrate how GraphCompass can be used to address key biological questions, such as how cellular organization and tissue architecture differ across various disease states and which spatial patterns correlate with a given pathological condition. GraphCompass can be applied to various popular omics techniques, including, but not limited to, spatial proteomics (e.g. MIBI-TOF), spot-based transcriptomics (e.g. 10× Genomics Visium), and single-cell resolved transcriptomics (e.g. Stereo-seq). In this work, we showcase the capabilities of GraphCompass through its application to three different studies that may also serve as benchmark datasets for further method development. With its easy-to-use implementation, extensive documentation, and comprehensive tutorials, GraphCompass is accessible to biologists with varying levels of computational expertise. By facilitating comparative analyses of cell spatial organization, GraphCompass promises to be a valuable asset in advancing our understanding of tissue function in health and disease.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[1049]

J. Shin, M. A. Hedderich, B. J. Rey, A. Lucero and A. Oulasvirta.
Understanding Human-AI Workflows for Generating Personas.
DIS 2023 - ACM Conference on Designing Interactive Systems. Copenhagen, Denmark, Jul 01-05, 2024. DOI

Abstract

One barrier to deeper adoption of user-research methods is the amount of labor required to create high-quality representations of collected data. Trained user researchers need to analyze datasets and produce informative summaries pertaining to the original data. While Large Language Models (LLMs) could assist in generating summaries, they are known to hallucinate and produce biased responses. In this paper, we study human–AI workflows that differently delegate subtasks in user research between human experts and LLMs. Studying persona generation as our case, we found that LLMs are not good at capturing key characteristics of user data on their own. Better results are achieved when we leverage human skill in grouping user data by their key characteristics and exploit LLMs for summarizing pre-grouped data into personas. Personas generated via this collaborative approach can be more representative and empathy-evoking than ones generated by human experts or LLMs alone. We also found that LLMs could mimic generated personas and enable interaction with personas, thereby helping user researchers empathize with them. We conclude that LLMs, by facilitating the analysis of user data, may promote widespread application of qualitative methods in user research.

MCML Authors

Michael Hedderich

Dr.

AI and Computational Linguistics

[1048]

M. Windl and S. S. Feger.
Designing Interactive Privacy Labels for Advanced Smart Home Device Configuration Options.
DIS 2023 - ACM Conference on Designing Interactive Systems. Copenhagen, Denmark, Jul 01-05, 2024. DOI

Abstract

Labels inform smart home users about the privacy of devices before purchase and during use. Yet, current privacy labels fail to fully reflect the impact of advanced device configuration options like sensor state control. Based on the successful implementation of related privacy and security labels, we designed extended static and interactive labels that reflect sensor states and device connectivity. We first did expert interviews (N=10) that informed the final label design. Second, we ran an online survey (N=160) to assess the interpretation and usability of the novel interactive privacy label. Lastly, we conducted a second survey (N=120) to investigate how well our interactive labels educate users about sensor configuration. We found that most participants successfully used the interactive label and retrieved sensor information more efficiently and correctly. We discuss our findings in the context of a potential shift in label use toward control and use-case-based interaction.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[1047]

E. Kraus and C. Kern.
Measurement Modeling of Predictors and Outcomes in Algorithmic Fairness.
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

This contribution investigates structural equation modeling (SEM) as a pre-processing approach to mitigate measurement bias in algorithmic decision-making systems. We construct latent predictors and latent targets based on different measurement modeling strategies and evaluate their interplay in simulations and an application study. We systematically compare SEMs which preserve group-differences (group-overarching) to models which equalize group-differences (group-specific) in predictors and outcomes. In our simulations, we find that group-overarching models are a more effective strategy than group-specific models and lead to smaller subgroup prediction error and better calibrated risk scores. In the application study we apply SEM to a health risk prediction task and find support for the benefit of group-overarching models. We conclude that tackling fairness concerns by utilizing measurement models of both the predictors and the outcome can contribute to the fairness of ADM systems. Utilizing SEM during preprocessing allows to incorporate substantive knowledge about the prediction task into the model implementation.

MCML Authors

Christoph Kern

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Social Data Science and AI Lab

[1046]

B. Ronval, S. Nijssen and L. Bothmann.
Can generative AI-based data balancing mitigate unfairness issues in Machine Learning?
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

Data imbalance in the protected attributes can lead to machine learning models that perform better on the majority than on the minority group, giving rise to unfairness issues. While baseline methods like undersampling or SMOTE can balance datasets, we investigate how methods of generative artificial intelligence compare concerning classical fairness metrics. Using generated fake data, we propose different balancing methods and investigate the behavior of classification models in thorough benchmark studies using German credit and Berkeley admission data. While our experiments suggest that such methods may improve fairness metrics, further investigations are necessary to derive clear practical recommendations.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[1045]

J. Simson, A. Fabris and C. Kern.
Unveiling the Blindspots: Examining Availability and Usage of Protected Attributes in Fairness Datasets.
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

This work examines the representation of protected attributes across tabular datasets used in algorithmic fairness research. Drawing from international human rights and anti-discrimination laws, we compile a set of protected attributes and investigate both their availability and usage in the literature. Our analysis reveals a significant underrepresentation of certain attributes in datasets that is exacerbated by a strong focus on race and sex in dataset usage. We identify a geographical bias towards the Global North, particularly North America, potentially limiting the applicability of fairness detection and mitigation strategies in less-represented regions. The study exposes critical blindspots in fairness research, highlighting the need for a more inclusive and representative approach to data collection and usage in the field. We propose a shift away from a narrow focus on a small number of datasets and advocate for initiatives aimed at sourcing more diverse and representative data.

MCML Authors

Jan Simson

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[1044]

C. Strasser Ceballos and C. Kern.
Deciding the Future of Refugees: Rolling the Dice or Algorithmic Location Assignment?
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

Upon arrival in Germany, refugees are distributed among the 16 federal states. This distribution decision is based on a fixed formula consisting of two components: tax revenue and the population size of the federal state. Research suggests that optimal refugee-location matching enhances refugee integration into the labor market. However, the current mechanism fails to align refugees’ characteristics with their assigned locations, resulting in a missed opportunity to leverage synergies. To this end, we use comprehensive refugee data in Germany and exploit an existing machine learning matching tool to assign refugees to states algorithmically. Our findings reveal potential improvements in refugee employment, depending on the modeling setup. Our study provides two key contributions. First, we evaluate the effectiveness of an algorithmic matching tool within Germany. Second, we investigate the fairness implications of such an algorithmic decision-making tool by evaluating the impact of different train data setups on group-specific model performance.

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[1043]

W. Qiu, Y. Feng, Y. Li, Y. Chang, K. Qian, B. Hu, Y. Yamamoto and B. W. Schuller.
Fed-MStacking: Heterogeneous Federated Learning With Stacking Misaligned Labels for Abnormal Heart Sound Detection.
IEEE Journal of Biomedical and Health Informatics 28.9 (Jul. 2024). DOI

Abstract

Ubiquitous sensing has been widely applied in smart healthcare, providing an opportunity for intelligent heart sound auscultation. However, smart devices contain sensitive information, raising user privacy concerns. To this end, federated learning (FL) has been adopted as an effective solution, enabling decentralised learning without data sharing, thus preserving data privacy in the Internet of Health Things (IoHT). Nevertheless, traditional FL requires the same architectural models to be trained across local clients and global servers, leading to a lack of model heterogeneity and client personalisation. For medical institutions with private data clients, this study proposes Fed-MStacking, a heterogeneous FL framework that incorporates a stacking ensemble learning strategy to support clients in building their own models. The secondary objective of this study is to address scenarios involving local clients with data characterised by inconsistent labelling. Specifically, the local client contains only one case type, and the data cannot be shared within or outside the institution. To train a global multi-class classifier, we aggregate missing class information from all clients at each institution and build meta-data, which then participates in FL training via a meta-learner. We apply the proposed framework to a multi-institutional heart sound database. The experiments utilise random forests (RFs), feedforward neural networks (FNNs), and convolutional neural networks (CNNs) as base classifiers. The results show that the heterogeneous stacking of local models performs better compared to homogeneous stacking.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[1042]

F. Fan, Y. Shi and X. Zhu.
Land Cover Classification From Sentinel-2 Images With Quantum-Classical Convolutional Neural Networks.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 17 (Jul. 2024). DOI

Abstract

Exploiting machine learning techniques to automatically classify multispectral remote sensing imagery plays a significant role in deriving changes on the Earth’s surface. However, the computation power required to manage large Earth observation data and apply sophisticated machine learning models for this analysis purpose has become an intractable bottleneck. Leveraging quantum computing provides a possibility to tackle this challenge in the future. This article focuses on land cover classification by analyzing Sentinel-2 images with quantum computing. Two hybrid quantum-classical deep learning frameworks are proposed. Both models exploit quantum computing to extract features efficiently from multispectral images and classical computing for final classification. As proof of concept, numerical simulation results on the LCZ42 dataset through the TensorFlow Quantum platform verify our models’ validity. The experiments indicate that our models can extract features more effectively compared with their classical counterparts, specifically, the convolutional neural network (CNN) model. Our models demonstrated improvements, with an average test accuracy increase of 4.5% and 3.3%, respectively, in comparison to the CNN model. In addition, our proposed models exhibit better transferability and robustness than CNN models.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[1041]

Z. Xiong, S. Chen, Y. Shi and X. Zhu.
Self-Supervised Pretraining With Monocular Height Estimation for Semantic Segmentation.
IEEE Transactions on Geoscience and Remote Sensing 62 (Jul. 2024). DOI GitHub

Abstract

Monocular height estimation (MHE) is key for generating 3-D city models, essential for swift disaster response. Moving beyond the traditional focus on performance enhancement, our study breaks new ground by probing the interpretability of MHE networks. We have pioneeringly discovered that neurons within MHE models demonstrate selectivity for both height and semantic classes. This insight sheds light on the complex inner workings of MHE models and inspires innovative strategies for leveraging elevation data more effectively. Informed by this insight, we propose a pioneering framework that employs MHE as a self-supervised pretraining method for remote sensing (RS) imagery. This approach significantly enhances the performance of semantic segmentation tasks. Furthermore, we develop a disentangled latent transformer (DLT) module that leverages explainable deep representations from pretrained MHE networks for unsupervised semantic segmentation. Our method demonstrates the significant potential of MHE tasks in developing foundation models for sophisticated pixel-level semantic analyses. Additionally, we present a new dataset designed to benchmark the performance of both semantic segmentation and height estimation tasks.

MCML Authors

Sining Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1040]

W. Yu, X. Zhang, S. Das, X. Zhu and P. Ghamisi.
MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification.
IEEE Transactions on Geoscience and Remote Sensing 62 (Jul. 2024). DOI GitHub

Abstract

Change detection (CD) from remote sensing (RS) images using deep learning has been widely investigated in the literature. It is typically regarded as a pixelwise labeling task that aims to classify each pixel as changed or unchanged. Although per-pixel classification networks in encoder-decoder structures have shown dominance, they still suffer from imprecise boundaries and incomplete object delineation at various scenes. For high-resolution RS images, partly or totally changed objects are more worthy of attention rather than a single pixel. Therefore, we revisit the CD task from the mask prediction and classification perspective and propose mask classification-based CD (MaskCD) to detect changed areas by adaptively generating categorized masks from input image pairs. Specifically, it utilizes a cross-level change representation perceiver (CLCRP) to learn multiscale change-aware representations and capture spatiotemporal relations from encoded features by exploiting deformable multihead self-attention (DeformMHSA). Subsequently, a masked cross-attention-based detection transformers (MCA-DETRs) decoder is developed to accurately locate and identify changed objects based on masked cross-attention and self-attention (SA) mechanisms. It reconstructs the desired changed objects by decoding the pixelwise representations into learnable mask proposals and making final predictions from these candidates. Experimental results on five benchmark datasets demonstrate the proposed approach outperforms other state-of-the-art models.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1039]

S. Milano and S. Nyholm.
Advanced AI assistants that act on our behalf may not be ethically or legally feasible.
Nature Machine Intelligence 6 (Jul. 2024). DOI

Abstract

Google and OpenAI have recently announced major product launches involving artificial intelligence (AI) agents based on large language models (LLMs) and other generative models. Notably, these are envisioned to function as personalized ‘advanced assistants’. With other companies following suit, such AI agents seem poised to be the next big thing in consumer technology, with the potential to disrupt work and social environments. To underscore the importance of these developments, Google DeepMind recently published an extensive report on the topic, which they describe as “one of [their] largest ethics foresight projects to date”1. The report defines AI assistants functionally as “artificial agent[s] with a natural language interface, the function of which is to plan and execute sequences of actions on the user’s behalf across one or more domains and in line with the user’s expectations”. The question the Google DeepMind researchers argue we should be pondering is ‘what kind of AI assistants do we want to see in the world?’. But a more fundamental question is whether AI assistants are feasible, given basic ethical and legal requirements. Key issues that will impact the deployment of AI agents concern liability and the ability of users to effectively transfer some of their agential powers to AI assistants.

MCML Authors

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[1038]

M. Fornasier, T. Klock and K. Riedl.
Consensus-Based Optimization Methods Converge Globally.
SIAM Journal on Optimization 34.3 (Jul. 2024). DOI

Abstract

In this paper we study consensus-based optimization (CBO), which is a multiagent metaheuristic derivative-free optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. Based on an experimentally supported intuition that, on average, CBO performs a gradient descent of the squared Euclidean distance to the global minimizer, we devise a novel technique for proving the convergence to the global minimizer in mean-field law for a rich class of objective functions. The result unveils internal mechanisms of CBO that are responsible for the success of the method. In particular, we prove that CBO performs a convexification of a large class of optimization problems as the number of optimizing agents goes to infinity. Furthermore, we improve prior analyses by requiring mild assumptions about the initialization of the method and by covering objectives that are merely locally Lipschitz continuous. As a core component of this analysis, we establish a quantitative nonasymptotic Laplace principle, which may be of independent interest. From the result of CBO convergence in mean-field law, it becomes apparent that the hardness of any global optimization problem is necessarily encoded in the rate of the mean-field approximation, for which we provide a novel probabilistic quantitative estimate. The combination of these results allows us to obtain probabilistic global convergence guarantees of the numerical CBO method.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Konstantin Riedl

Dr.

* Former Member

[1037]

F. Poszler and B. Lange.
The impact of intelligent decision-support systems on humans' ethical decision-making: A systematic literature review and an integrated framework.
Technological Forecasting and Social Change 204.123403 (Jul. 2024). DOI

Abstract

With the rise and public accessibility of AI-enabled decision-support systems, individuals outsource increasingly more of their decisions, even those that carry ethical dimensions. Considering this trend, scholars have highlighted that uncritical deference to these systems would be problematic and consequently called for investigations of the impact of pertinent technology on humans’ ethical decision-making. To this end, this article conducts a systematic review of existing scholarship and derives an integrated framework that demonstrates how intelligent decision-support systems (IDSSs) shape humans’ ethical decision-making. In particular, we identify resulting consequences on an individual level (i.e., deliberation enhancement, motivation enhancement, autonomy enhancement and action enhancement) and on a societal level (i.e., moral deskilling, restricted moral progress and moral responsibility gaps). We carve out two distinct methods/operation types (i.e., process-oriented and outcome-oriented navigation) that decision-support systems can deploy and postulate that these determine to what extent the previously stated consequences materialize. Overall, this study holds important theoretical and practical implications by establishing clarity in the conceptions, underlying mechanisms and (directions of) influences that can be expected when using particular IDSSs for ethical decisions.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[1036]

F. Quinzan, C. Casolo, K. Muandet, Y. Luo and N. Kilbertus.
Learning Counterfactually Invariant Predictors.
Transactions on Machine Learning Research (Jul. 2024). URL

Abstract

Notions of counterfactual invariance (CI) have proven essential for predictors that are fair, robust, and generalizable in the real world. We propose graphical criteria that yield a sufficient condition for a predictor to be counterfactually invariant in terms of a conditional independence in the observational distribution. In order to learn such predictors, we propose a model-agnostic framework, called Counterfactually Invariant Prediction (CIP), building on the Hilbert-Schmidt Conditional Independence Criterion (HSCIC), a kernel-based conditional dependence measure. Our experimental results demonstrate the effectiveness of CIP in enforcing counterfactual invariance across various simulated and real-world datasets including scalar and multi-variate settings.

MCML Authors

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Ethics in Systems Design and Machine Learning

[1035]

F. Karl, J. Thomas, J. Elstner, R. Gross and B. Bischl.
Automated Machine Learning.
Unlocking Artificial Intelligence (Jul. 2024). DOI

Abstract

In the past few years automated machine learning (AutoML) has gained a lot of traction in the data science and machine learning community. AutoML aims at reducing the partly repetitive work of data scientists and enabling domain experts to construct machine learning pipelines without extensive knowledge in data science. This chapter presents a comprehensive review of the current leading AutoML methods and sets AutoML in an industrial context. To this extent we present the typical components of an AutoML system, give an overview over the stateof-the-art and highlight challenges to industrial application by presenting several important topics such as AutoML for time series data, AutoML in unsupervised settings, AutoML with multiple evaluation criteria, or interactive human-in-the-loop methods. Finally, the connection to Neural Architecture Search (NAS) is presented and a brief review with special emphasis on hardware-aware NAS is given.

MCML Authors

Florian Karl

Statistical Learning and Data Science

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[1034]

J. Beddrich, E. Chenchene, M. Fornasier, H. Huang and B. Wohlmuth.
Constrained Consensus-Based Optimization and Numerical Heuristics for the Few Particle Regime.
Preprint (Jul. 2024). arXiv

Abstract

Consensus-based optimization (CBO) is a versatile multi-particle optimization method for performing nonconvex and nonsmooth global optimizations in high dimensions. Proofs of global convergence in probability have been achieved for a broad class of objective functions in unconstrained optimizations. In this work we adapt the algorithm for solving constrained optimizations on compact and unbounded domains with boundary by leveraging emerging reflective boundary conditions. In particular, we close a relevant gap in the literature by providing a global convergence proof for the many-particle regime comprehensive of convergence rates. On the one hand, for the sake of minimizing running cost, it is desirable to keep the number of particles small. On the other hand, reducing the number of particles implies a diminished capability of exploration of the algorithm. Hence numerical heuristics are needed to ensure convergence of CBO in the few-particle regime. In this work, we also significantly improve the convergence and complexity of CBO by utilizing an adaptive region control mechanism and by choosing geometry-specific random noise. In particular, by combining a hierarchical noise structure with a multigrid finite element method, we are able to compute global minimizers for a constrained p-Allen-Cahn problem with obstacles, a very challenging variational problem.

MCML Authors

Massimo Fornasier

Prof. Dr.

Applied Numerical Analysis

[1033]

F. Bongratz, V. Golkov, L. Mautner, L. Della Libera, F. Heetmeyer, F. Czaja, J. Rodemann and D. Cremers.
How to Choose a Reinforcement-Learning Algorithm.
Preprint (Jul. 2024). arXiv GitHub

Abstract

The field of reinforcement learning offers a large variety of concepts and methods to tackle sequential decision-making problems. This variety has become so large that choosing an algorithm for a task at hand can be challenging. In this work, we streamline the process of choosing reinforcement-learning algorithms and action-distribution families. We provide a structured overview of existing methods and their properties, as well as guidelines for when to choose which methods.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1032]

M. Dani, M. J. Prakash, Z. Akata and S. Liebe.
SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research.
Preprint (Jul. 2024). arXiv

Abstract

Large Language Models have shown promising results in their ability to encode general medical knowledge in standard medical question-answering datasets. However, their potential application in clinical practice requires evaluation in domain-specific tasks, where benchmarks are largely missing. In this study semioLLM, we test the ability of state-of-the-art LLMs (GPT-3.5, GPT-4, Mixtral 8x7B, and Qwen-72chat) to leverage their internal knowledge and reasoning for epilepsy diagnosis. Specifically, we obtain likelihood estimates linking unstructured text descriptions of seizures to seizure-generating brain regions, using an annotated clinical database containing 1269 entries. We evaluate the LLM’s performance, confidence, reasoning, and citation abilities in comparison to clinical evaluation. Models achieve above-chance classification performance with prompt engineering significantly improving their outcome, with some models achieving close-to-clinical performance and reasoning. However, our analyses also reveal significant pitfalls with several models being overly confident while showing poor performance, as well as exhibiting citation errors and hallucinations. In summary, our work provides the first extensive benchmark comparing current SOTA LLMs in the medical domain of epilepsy and highlights their ability to leverage unstructured texts from patients’ medical history to aid diagnostic processes in health care.

MCML Authors

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[1031]

M. Fuest, P. Ma, M. Gui, J. Schusterbauer, V. T. Hu and B. Ommer.
Diffusion Models and Representation Learning: A Survey.
Preprint (Jul. 2024). arXiv GitHub

Abstract

Diffusion Models are popular generative modeling methods in various vision tasks, attracting significant attention. They can be considered a unique instance of self-supervised learning methods due to their independence from label annotation. This survey explores the interplay between diffusion models and representation learning. It provides an overview of diffusion models’ essential aspects, including mathematical foundations, popular denoising network architectures, and guidance methods. Various approaches related to diffusion models and representation learning are detailed. These include frameworks that leverage representations learned from pre-trained diffusion models for subsequent recognition tasks and methods that utilize advancements in representation and self-supervised learning to enhance diffusion models. This survey aims to offer a comprehensive overview of the taxonomy between diffusion models and representation learning, identifying key areas of existing concerns and potential exploration.

MCML Authors

Pingchuan Ma

Computer Vision & Learning

Johannes Schusterbauer

Computer Vision & Learning

Vincent Tao Hu

Dr.

Computer Vision & Learning

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[1030]

D. Köhler, D. Rügamer and M. Schmid.
Achieving interpretable machine learning by functional decomposition of black-box models into explainable predictor effects.
Preprint (Jul. 2024). arXiv

Abstract

Machine learning (ML) has seen significant growth in both popularity and importance. The high prediction accuracy of ML models is often achieved through complex black-box architectures that are difficult to interpret. This interpretability problem has been hindering the use of ML in fields like medicine, ecology and insurance, where an understanding of the inner workings of the model is paramount to ensure user acceptance and fairness. The need for interpretable ML models has boosted research in the field of interpretable machine learning (IML). Here we propose a novel approach for the functional decomposition of black-box predictions, which is considered a core concept of IML. The idea of our method is to replace the prediction function by a surrogate model consisting of simpler subfunctions. Similar to additive regression models, these functions provide insights into the direction and strength of the main feature contributions and their interactions. Our method is based on a novel concept termed stacked orthogonality, which ensures that the main effects capture as much functional behavior as possible and do not contain information explained by higher-order interactions. Unlike earlier functional IML approaches, it is neither affected by extrapolation nor by hidden feature interactions. To compute the subfunctions, we propose an algorithm based on neural additive modeling and an efficient post-hoc orthogonalization procedure.

MCML Authors

David Rügamer

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Statistics, Data Science and Machine Learning

[1029]

C. Ma, Y. Liu, H. Ye and H. Schütze.
Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts.
Preprint (Jul. 2024). arXiv

Abstract

Decoder-only large language models (LLMs) excel in high-resource languages across various tasks through few-shot or even zero-shot in-context learning (ICL). However, their performance often does not transfer well to low-resource languages, especially those written in non-Latin scripts. Inspired by recent work that leverages transliteration in encoder-only models, we investigate whether transliteration is also effective in improving LLMs’ performance for low-resource languages written in non-Latin scripts. To this end, we propose three prompt templates, where the target-language text is represented in (1) its original script, (2) Latin script, or (3) both. We apply these methods to several representative LLMs of different sizes on various tasks including text classification and sequential labeling. Our findings show that the effectiveness of transliteration varies by task type and model size. For instance, all models benefit from transliterations for sequential labeling (with increases of up to 25%).

MCML Authors

Chunlan Ma

Computational Linguistics

Yihong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Haotian Ye

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computational Linguistics

[1028]

M. Schröder, D. Frauen, J. Schweisthal, K. Heß, V. Melnychuk and S. Feuerriegel.
Conformal Prediction for Causal Effects of Continuous Treatments.
Preprint (Jul. 2024). arXiv

Abstract

Uncertainty quantification of causal effects is crucial for safety-critical applications such as personalized medicine. A powerful approach for this is conformal prediction, which has several practical benefits due to model-agnostic finite-sample guarantees. Yet, existing methods for conformal prediction of causal effects are limited to binary/discrete treatments and make highly restrictive assumptions such as known propensity scores. In this work, we provide a novel conformal prediction method for potential outcomes of continuous treatments. We account for the additional uncertainty introduced through propensity estimation so that our conformal prediction intervals are valid even if the propensity score is unknown. Our contributions are three-fold: (1) We derive finite-sample prediction intervals for potential outcomes of continuous treatments. (2) We provide an algorithm for calculating the derived intervals. (3) We demonstrate the effectiveness of the conformal prediction intervals in experiments on synthetic and real-world datasets. To the best of our knowledge, we are the first to propose conformal prediction for continuous treatments when the propensity score is unknown and must be estimated from data.

MCML Authors

Maresa Schröder

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1027]

F. Sergeev, P. Malsot, G. Rätsch and V. Fortuin.
Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information.
Preprint (Jul. 2024). arXiv

Abstract

Knowing which features of a multivariate time series to measure and when is a key task in medicine, wearables, and robotics. Better acquisition policies can reduce costs while maintaining or even improving the performance of downstream predictors. Inspired by the maximization of conditional mutual information, we propose an approach to train acquirers end-to-end using only the downstream loss. We show that our method outperforms random acquisition policy, matches a model with an unrestrained budget, but does not yet overtake a static acquisition strategy. We highlight the assumptions and outline avenues for future work.

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

[1026]

A. Taghipour, M. Ghahremani, M. Bennamoun, A. M. Rekavandi, Z. Li, H. Laga and F. Boussaid.
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions.
Preprint (Jul. 2024). arXiv

Abstract

This paper investigates the role of CLIP image embeddings within the Stable Video Diffusion (SVD) framework, focusing on their impact on video generation quality and computational efficiency. Our findings indicate that CLIP embeddings, while crucial for aesthetic quality, do not significantly contribute towards the subject and background consistency of video outputs. Moreover, the computationally expensive cross-attention mechanism can be effectively replaced by a simpler linear layer. This layer is computed only once at the first diffusion inference step, and its output is then cached and reused throughout the inference process, thereby enhancing efficiency while maintaining high-quality outputs. Building on these insights, we introduce the VCUT, a training-free approach optimized for efficiency within the SVD architecture. VCUT eliminates temporal cross-attention and replaces spatial cross-attention with a one-time computed linear layer, significantly reducing computational load. The implementation of VCUT leads to a reduction of up to 322T Multiple-Accumulate Operations (MACs) per video and a decrease in model parameters by up to 50M, achieving a 20% reduction in latency compared to the baseline. Our approach demonstrates that conditioning during the Semantic Binding stage is sufficient, eliminating the need for continuous computation across all inference steps and setting a new standard for efficient video generation.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

[1025]

A. Triantafyllopoulos, I. Tsangko, A. Gebhard, A. Mesaros, T. Virtanen and B. W. Schuller.
Computer Audition: From Task-Specific Machine Learning to Foundation Models.
Preprint (Jul. 2024). arXiv

Abstract

Foundation models (FMs) are increasingly spearheading recent advances on a variety of tasks that fall under the purview of computer audition – the use of machines to understand sounds. They feature several advantages over traditional pipelines: among others, the ability to consolidate multiple tasks in a single model, the option to leverage knowledge from other modalities, and the readily-available interaction with human users. Naturally, these promises have created substantial excitement in the audio community, and have led to a wave of early attempts to build new, general-purpose foundation models for audio. In the present contribution, we give an overview of computational audio analysis as it transitions from traditional pipelines towards auditory foundation models. Our work highlights the key operating principles that underpin those models, and showcases how they can accommodate multiple tasks that the audio community previously tackled separately.

MCML Authors

Andreas Triantafyllopoulos

Health Informatics

Iosif Tsangko

Health Informatics

Alexander Gebhard

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[1024]

Y. Xia, R. Ding, Z. Qin, G. Zhan, K. Zhou, L. Yang, H. Dong and D. Cremers.
TARGO: Benchmarking Target-driven Object Grasping under Occlusions.
Preprint (Jul. 2024). arXiv GitHub

Abstract

Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic grasping. However, previous grasping models face challenges in cluttered environments where nearby objects impact the target object’s grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make the following contributions: 1) We are the first to study the occlusion level of grasping. 2) We set up an evaluation benchmark consisting of large-scale synthetic data and part of real-world data, and we evaluated five grasp models and found that even the current SOTA model suffers when the occlusion level increases, leaving grasping under occlusion still a challenge. 3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of grasping under occlusion and generalized to the real world. 4) We further propose a transformer-based grasping model involving a shape completion module, termed TARGO-Net, which performs most robustly as occlusion increases.

MCML Authors

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1023]

F. Fan, Y. Shi and X. Zhu.
Urban Land Cover Classification with Efficient Hybrid Quantum Machine Learning Model.
CEC 2024 - IEEE Congress on Evolutionary Computation. Yokohama, Japan, Jun 30-Jul 05, 2024. DOI

Abstract

Urban land cover classification aims to derive crucial information from earth observation data and categorize it into specific land uses. To achieve accurate classification, sophisticated machine learning models trained with large earth observation data are employed, but the required computation power has become a bottleneck. Quantum computing might tackle this challenge in the future. However, representing images into quantum states for analysis with quantum computing is challenging due to the high demand for quantum resources. To tackle this challenge, we propose a hybrid quantum neural network that can effectively represent and classify remote sensing imagery with reduced quantum resources. Our model was evaluated on the Local Climate Zone (LCZ)-based land cover classification task using the TensorFlow Quantum platform, and the experimental results indicate its validity for accurate urban land cover classification.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Data Science in Earth Observation

[1022]

C. M. Verdun, O. Melnyk, F. Krahmer and P. Jung.
Fast, blind, and accurate: Tuning-free sparse regression with global linear convergence.
COLT 2024 - 37th Annual Conference on Learning Theory. Edmonton, Canada, Jun 30-Jul 03, 2024. URL

Abstract

Many algorithms for high-dimensional regression problems require the calibration of regularization hyperparameters. This, in turn, often requires the knowledge of the unknown noise variance in order to produce meaningful solutions. Recent works show, however, that there exist certain estimators that are pivotal, i.e., the regularization parameter does not depend on the noise level; the most remarkable example being the square-root lasso. Such estimators have also been shown to exhibit strong connections to distributionally robust optimization. Despite the progress in the design of pivotal estimators, the resulting minimization problem is challenging as both the loss function and the regularization term are non-smooth. To date, the design of fast, robust, and scalable algorithms with strong convergence rate guarantees is still an open problem. This work addresses this problem by showing that an iteratively reweighted least squares (IRLS) algorithm exhibits global linear convergence under the weakest assumption available in the literature. We expect our findings will also have implications for multi-task learning and distributionally robust optimization.

MCML Authors

Claudio Mayrink Verdun

Dr.

* Former Member

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[1021]

C. F. Naumzik, A. Kongsted, W. Vach and S. Feuerriegel.
Data-driven subgrouping of patient trajectories with chronic diseases: Evidence from low back pain.
CHIL 2024 - 5th AHLI Conference on Health, Inference, and Learning . New York City, NY, USA, Jun 27-28, 2024. URL

Abstract

Clinical data informs the personalization of health care with a potential for more effective disease management. In practice, this is achieved by emph{subgrouping}, whereby clusters with similar patient characteristics are identified and then receive customized treatment plans with the goal of targeting subgroup-specific disease dynamics. In this paper, we propose a novel mixture hidden Markov model for subgrouping patient trajectories from emph{chronic diseases}. Our model is probabilistic and carefully designed to capture different trajectory phases of chronic diseases (i.e., “severe”, “moderate”, and “mild”) through tailored latent states. We demonstrate our subgrouping framework based on a longitudinal study across 847 patients with non-specific low back pain. Here, our subgrouping framework identifies 8 subgroups. Further, we show that our subgrouping framework outperforms common baselines in terms of cluster validity indices. Finally, we discuss the applicability of the model to other chronic and long-lasting diseases.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[1020]

J. Brandt, B. Haddenhorst, V. Bengs and E. Hüllermeier.
Dueling Bandits with Delayed Feedback.
DataNinja sAIOnARA 2024 - DataNinja sAIOnARA Conference: Shaping Trustworthy AI: Opportunities, Innovation and Achievements for Reliable Approaches. Bielefeld, Germany, Jun 25-27, 2024. DOI

Abstract

Dueling Bandits is a well-studied extension of the Multi-Armed Bandits problem, in which the learner must select two arms in each time step and receives a binary feedback as an outcome of the chosen duel. However, all of the existing best arm identification algorithms for the Dueling Bandits setting assume that the feedback can be observed immediately after selecting the two arms. If this is not the case, the algorithms simply do nothing and wait until the feedback of the recent duel can be observed, which is a waste of runtime. We propose an algorithm that can already start a new duel even if the previous one is not finished and thus is much more time efficient. Our arm selection strategy balances the expected information gain of the chosen duel and the expected delay until we observe the feedback. By theoretically grounded confidence bounds we can ensure that the arms we discard are not the best arms with high probability.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Artificial Intelligence and Machine Learning

[1019]

C. Cipriani, A. Scagliotti and T. Wöhrer.
A Minimax Optimal Control Approach for Robust Neural ODEs.
ECC 2024 - European Control Conference. Stockholm, Sweden, Jun 25-28, 2024. DOI

Abstract

In this paper, we address the adversarial training of neural ODEs from a robust control perspective. This is an alternative to the classical training via empirical risk minimization, and it is widely used to enforce reliable outcomes for input perturbations. Neural ODEs allow the interpretation of deep neural networks as discretizations of control systems, unlocking powerful tools from control theory for the development and the understanding of machine learning. In this specific case, we formulate the adversarial training with perturbed data as a minimax optimal control problem, for which we derive first order optimality conditions in the form of Pontryagin’s Maximum Principle. We provide a novel interpretation of robust training leading to an alternative weighted technique, which we test on a low-dimensional classification task.

MCML Authors

Cristina Cipriani

* Former Member

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[1018]

P. Piccirilli, A. Fraser and S. Schulte im Walde.
VOLIMET: A Parallel Corpus of Literal and Metaphorical Verb-Object Pairs for English–German and English–French.
*SEM 2024 - 13th Joint Conference on Lexical and Computational Semantics co-located with NAACL 2024. Mexico City, Mexico, Jun 20-21, 2024. DOI

Abstract

The interplay of cultural and linguistic elements that characterizes metaphorical language poses a substantial challenge for both human comprehension and machine processing. This challenge goes beyond monolingual settings and becomes particularly complex in translation, even more so in automatic translation. We present VOLIMET, a corpus of 2,916 parallel sentences containing gold standard alignments of metaphorical verb-object pairs and their literal paraphrases, e.g., tackle/address question, from English to German and French. On the one hand, the parallel nature of our corpus enables us to explore monolingual patterns for metaphorical vs. literal uses in English. On the other hand, we investigate different aspects of cross-lingual translations into German and French and the extent to which metaphoricity and literalness in the source language are transferred to the target languages. Monolingually, our findings reveal clear preferences in using metaphorical or literal uses of verb-object pairs. Cross-lingually, we observe a rich variability in translations as well as different behaviors for our two target languages.

MCML Authors

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[1017]

M. Brahimi, B. Haefner, Z. Ye, B. Goldluecke and D. Cremers.
Sparse Views, Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric Stereo.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Neural approaches have shown a significant progress on camera-based reconstruction. But they require either a fairly dense sampling of the viewing sphere, or pre-training on an existing dataset, thereby limiting their generalizability. In contrast, photometric stereo (PS) approaches have shown great potential for achieving high-quality reconstruction under sparse viewpoints. Yet, they are impractical because they typically require tedious laboratory conditions, are restricted to dark rooms, and often multi-staged, making them subject to accumulated errors. To address these shortcomings, we propose an end-to-end uncalibrated multi-view PS frameworkfor reconstructing high-resolution shapes acquiredfrom sparse viewpoints in a real-world environment. We relax the dark room assumption, and allow a combination of static ambient lighting and dynamic near LED lighting, thereby enabling easy data capture outside the lab. Experimental validation confirms that it outperforms existing baseline approaches in the regime of sparse viewpoints by a large margin. This allows to bring high-accuracy 3D reconstruction from the dark room to the real world, while maintaining a reasonable data capture complexity.

MCML Authors

Zhenzhang Ye

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1016]

Y. Chen, Y. Di, G. Zhai, F. Manhardt, C. Zhang, R. Zhang, F. Tombari, N. Navab and B. Busam.
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Category-level object pose estimation, aiming to predict the 6D pose and 3D size of objects from known categories, typically struggles with large intra-class shape variation. Existing works utilizing mean shapes often fall short of cap-turing this variation. To address this issue, we present Sec-ondPose, a novel approach integrating object-specific ge-ometric features with semantic category priors from DI-NOv2. Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features, we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information. These geometric features are then point-aligned with DINOv2 features to establish a consistent object represen-tation under SE(3) transformations, facilitating the map-ping from camera space to the pre-defined canonical space, thus further enhancing pose estimation. Extensive exper-iments on NOCS-REAL275 demonstrate that SecondPose achieves a 12.4% leap forward over the state-of-the-art. Moreover, on a more complex dataset HouseCat6D which provides photometrically challenging objects, SecondPose still surpasses other competitors by a large margin.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1015]

V. Ehm, M. Gao, P. Roetzer, M. Eisenberger, D. Cremers and F. Bernard.
Partial-to-Partial Shape Matching with Geometric Consistency.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub

Abstract

Finding correspondences between 3D shapes is an important and long-standing problem in computer vision, graphics and beyond. A prominent challenge are partial-to-partial shape matching settings, which occur when the shapes to match are only observed incompletely (e.g. from 3D scanning). Although partial-to-partial matching is a highly relevant setting in practice, it is rarely explored. Our work bridges the gap between existing (rather artificial) 3D full shape matching and partial-to-partial real-world set-tings by exploiting geometric consistency as a strong constraint. We demonstrate that it is indeed possible to solve this challenging problem in a variety of settings. For the first time, we achieve geometric consistency for partial-to-partial matching, which is realized by a novel integer non-linear program formalism building on triangle prod-uct spaces, along with a new pruning algorithm based on linear integer programming. Further, we generate a new inter-class dataset for partial-to-partial shape-matching. We show that our method outperforms current SOTA meth-ods on both an established intra-class dataset and our novel inter-class dataset.

MCML Authors

Viktoria Ehm

Computer Vision & Artificial Intelligence

Maolin Gao

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1014]

M. Ghahremani, M. Khateri, B. Jian, B. Wiestler, E. Adeli and C. Wachinger.
H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub

Abstract

This paper introduces a novel top-down representation approach for deformable image registration, which estimates the deformation field by capturing various short-and long-range flow features at different scale levels. As a Hierarchical Vision Transformer (H-ViT), we propose a dual self-attention and cross-attention mechanism that uses high-level features in the deformation field to represent low-level ones, enabling information streams in the deformation field across all voxel patch embeddings irrespective of their spatial proximity. Since high-level features contain abstract flow patterns, such patterns are expected to effectively contribute to the representation of the deformation field in lower scales. When the self-attention module utilizes within-scale short-range patterns for representation, the cross-attention modules dynamically look for the key tokens across different scales to further interact with the local query voxel patches. Our method shows superior accuracy and visual quality over the state-of-the-art registration methods in five publicly available datasets, highlighting a substantial enhancement in the performance of medical imaging registration.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Bailiang Jian

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[1013]

K. Han, D. Muhle, F. Wimbauer and D. Cremers.
Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Inferring scene geometry from images via Structure from Motion is a long-standing and fundamental problem in computer vision. While classical approaches and, more re-cently, depth map predictions only focus on the visible parts of a scene, the task of scene completion aims to reason about geometry even in occluded regions. With the popularity of neural radiance fields (NeRFs), implicit represen-tations also became popular for scene completion by pre-dicting so-called density fields. Unlike explicit approaches e.g. voxel-based methods, density fields also allow for ac-curate depth prediction and novel-view synthesis via image-based rendering. In this work, we propose to fuse the scene reconstruction from multiple images and distill this knowl-edge into a more accurate single-view scene reconstruction. To this end, we propose Multi-View Behind the Scenes (MVBTS) to fuse density fields from multiple posed images, trained fully self-supervised only from image data. Using knowledge distillation, we use MVBTS to train a single-view scene completion network via direct supervision called KDBTS. It achieves state-of-the-art performance on occu-pancy prediction, especially in occluded regions.

MCML Authors

Dominik Muhle

Computer Vision & Artificial Intelligence

Felix Wimbauer

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1012]

J. Huang, H. Yu, K.-T. Yu, N. Navab, S. Ilic and B. Busam.
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Recent learning methods for object pose estimation require resource-intensive training for each individual object instance or category, hampering their scalability in real applications when confronted with previously unseen objects. In this paper, we propose MatchU, a Fuse-Describe-Match strategy for 6D pose estimation from RGB-D images. MatchU is a generic approach that fuses 2D texture and 3D geometric cues for 6D pose prediction of unseen objects. We rely on learning geometric 3D descriptors that are rotation-invariant by design. By encoding pose-agnostic geometry, the learned descriptors naturally generalize to unseen objects and capture symmetries. To tackle ambiguous associations using 3D geometry only, we fuse additional RGB information into our descriptor. This is achieved through a novel attention-based mechanism that fuses cross-modal information, together with a matching loss that leverages the latent space learned from RGB data to guide the descriptor learning process. Extensive experiments reveal the generalizability of both the RGB-D fusion strategy as well as the descriptor efficacy. Benefiting from the novel designs, MatchU surpasses all existing methods by a significant margin in terms of both accuracy and speed, even without the requirement of expensive re-training or rendering.

MCML Authors

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1011]

H. Jung, S.-C. Wu, P. Ruhkamp, G. Zhai, H. Schieber, G. Rizzoli, P. Wang, H. Zhao, L. Garattoni, D. Roth, S. Meier, N. Navab and B. Busam.
HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Estimating 6D object poses is a major challenge in 3D computer vision. Building on successful instance-level approaches, research is shifting towards category-level pose estimation for practical applications. Current category-level datasets, however, fall short in annotation quality and pose variety. Addressing this, we introduce HouseCat6D, a new category-level 6D pose dataset. It features 1) multi-modality with Polarimetric RGB and Depth (RGBD+P), 2) encompasses 194 diverse objects across 10 household cat-egories, including two photometrically challenging ones, and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm. The dataset also includes 4) 41 large-scale scenes with comprehensive view-point and occlusion coverage,5) a checkerboard-free en-vironment, and 6) dense 6D parallel-jaw robotic grasp annotations. Additionally, we present benchmark results for leading category-level pose estimation networks.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[1010]

H. Li, C. Shen, P. Torr, V. Tresp and J. Gu.
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub

Abstract

Diffusion-based models have gained significant popularity for text-to-image generation due to their exceptional image-generation capabilities. A risk with these models is the potential generation of inappropriate content, such as biased or harmful images. However, the underlying reasons for generating such undesired content from the perspective of the diffusion model’s internal representation remain unclear. Previous work interprets vectors in an interpretable latent space of diffusion models as semantic concepts. However, existing approaches cannot discover directions for arbitrary concepts, such as those related to inappropriate concepts. In this work, we propose a novel self-supervised approach to find interpretable latent directions for a given concept. With the discovered vectors, we further propose a simple approach to mitigate inappropriate generation. Extensive experiments have been conducted to verify the effectiveness of our mitigation approach, namely, for fair generation, safe generation, and responsible text-enhancing generation.

MCML Authors

Hang Li

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[1009]

A. Toker, M. Eisenberger, D. Cremers and L. Leal-Taixé.
SatSynth: Augmenting Image-Mask Pairs Through Diffusion Models for Aerial Semantic Segmentation.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

In recent years, semantic segmentation has become a pivotal tool in processing and interpreting satellite imagery. Yet, a prevalent limitation of supervised learning techniques remains the need for extensive manual annotations by experts. In this work, we explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks. The main idea is to learn the joint data manifold of images and labels, leveraging recent ad-vancements in denoising diffusion probabilistic models. To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation. We find that the obtained pairs not only display high quality in fine-scale features but also ensure a wide sampling diversity. Both aspects are crucial for earth observation data, where semantic classes can vary severely in scale and occurrence frequency. We employ the novel data instances for downstream segmentation, as a form of data augmentation. In our experiments, we provide comparisons to prior works based on discriminative diffusion models or GANs. We demonstrate that integrating generated samples yields significant quantitative improvements for satellite semantic segmentation - both compared to baselines and when training only on the original data.

MCML Authors

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Laura Leal-Taixé

Prof. Dr.

* Former Principal Investigator

[1008]

S. Weber, T. Dagès, M. Gao and D. Cremers.
Finsler-Laplace-Beltrami Operators with Application to Shape Analysis.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

The Laplace-Beltrami operator (LBO) emerges from studying manifolds equipped with a Riemannian metric. It is often called the swiss army knife of geometry processing as it allows to capture intrinsic shape information and gives rise to heat diffusion, geodesic distances, and a mul-titude of shape descriptors. It also plays a central role in geometric deep learning. In this work, we explore Finsler manifolds as a generalization of Riemannian manifolds. We revisit the Finsler heat equation and derive a Finsler heat kernel and a Finsler-Laplace-Beltrami Operator (FLBO): a novel theoretically justified anisotropic Laplace-Beltrami operator (ALBO). In experimental evaluations we demon-strate that the proposed FLBO is a valuable alternative to the traditional Riemannian-based LBO and ALBOs for spa-tialfiltering and shape correspondence estimation. We hope that the proposed Finsler heat kernel and the FLBO will inspire further exploration of Finsler geometry in the Computer vision community.

MCML Authors

Simon Weber

Computer Vision & Artificial Intelligence

Maolin Gao

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1007]

S. Weber, B. Zöngür, N. Araslanov and D. Cremers.
Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Hierarchy is a natural representation of semantic taxonomies, including the ones routinely used in image segmentation. Indeed, recent work on semantic segmentation reports improved accuracy from supervised training leveraging hierarchical label structures. Encouraged by these results, we revisit the fundamental assumptions behind that work. We postulate and then empirically verify that the reasons for the observed improvement in segmentation accuracy may be entirely unrelated to the use of the semantic hierarchy. To demonstrate this, we design a range of crossdomain experiments with a representative hierarchical approach. We find that on the new testing domains, a flat (non-hierarchical) segmentation network, in which the parents are inferred from the children, has superior segmentation accuracy to the hierarchical approach across the board. Complementing these findings and inspired by the intrinsic properties of hyperbolic spaces, we study a more principled approach to hierarchical segmentation using the Poincare ball model. The hyperbolic representation largely outperforms the previous (Euclidean) hierarchical approach as well and is on par with our flat Euclidean baseline in terms of segmentation accuracy. However, it additionally exhibits surprisingly strong calibration quality of the parent nodes in the semantic hierarchy, especially on the more challenging domains. Our combined analysis suggests that the established practice of hierarchical segmentation may be limited to in-domain settings, whereas flat classifiers generalize substantially better, especially if they are modeled in the hyperbolic space.

MCML Authors

Simon Weber

Computer Vision & Artificial Intelligence

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1006]

F. Wimbauer, B. Wu, E. Schoenfeld, X. Dai, J. Hou, Z. He, A. Sanakoyeu, P. Zhang, S. Tsai, J. Kohler, C. Rupprecht, D. Cremers, P. Vajda and J. Wang.
Cache Me if You Can: Accelerating Diffusion Models through Block Caching.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub

Abstract

Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of diffusion models is that the image generation process is costly. A large image-to-image network has to be applied many times to iteratively refine an image from random noise. While many recent works propose techniques to reduce the number of required steps, they generally treat the underlying denoising network as a black box. In this work, we investigate the behavior of the layers within the network and find that 1) the layers’ output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small. We hypothesize that many layer computations in the denoising network are redundant. Leveraging this, we introduce block caching, in which we reuse outputs from layer blocks of previous steps to speed up inference. Furthermore, we propose a technique to automatically determine caching schedules based on each block’s changes over timesteps. In our experiments, we show through FID, human evaluation and qualitative analysis that Block Caching allows to generate images with higher visual quality at the same computational cost. We demonstrate this for different state-of-the-art models (LDM and EMU) and solvers (DDIM and DPM).

MCML Authors

Felix Wimbauer

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1005]

Y. Xia, L. Shi, Z. Ding, J. F. Henriques and D. Cremers.
Text2Loc: 3D Point Cloud Localization from Natural Language.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub

Abstract

We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition, followed by fine localization. In global place recognition, relational dynamics among each textual hint are captured in a hierarchical transformer with max-pooling (HTM), whereas a balance between positive and negative pairs is maintained using text-submap contrastive learning. Moreover, we propose a novel matching-free fine localization method to further refine the location predictions, which completely removes the need for complicated text-instance matching and is lighter, faster, and more accurate than previous methods. Extensive experiments show that Text2Loc improves the localization accuracy by up to 2 × over the state-of-the-art on the KITTI360Pose dataset.

MCML Authors

Yan Xia

Dr.

* Former Member

Zifeng Ding

Database Systems and Data Mining

Daniel Cremers

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Computer Vision & Artificial Intelligence

[1004]

A. Höhl, I. Obadic, M.-Á. Fernández-Torres, D. Oliveira and X. Zhu;.
Recent Trends Challenges and Limitations of Explainable AI in Remote Sensing.
CVPR 2024 - Workshop at the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. URL

Abstract

Training deep learning models on remote sensing imagery is an increasingly popular approach for addressing pressing challenges related to urbanization extreme weather events food security deforestation or poverty reduction. Although explainable AI is getting more frequently utilized to uncover the workings of these models a comprehensive summary of how the fundamental challenges in remote sensing are tackled by explainable AI is still missing. By conducting a scoping review we identify the current works and key trends in the field. Next we relate them to recent developments and challenges in remote sensing and explainable AI. By doing so we also point to novel strategies and promising research directions such as the work on self-interpretable deep learning models and explanation evaluation.

MCML Authors

Adrian Höhl

Data Science in Earth Observation

Ivica Obadic

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[1003]

I. Obadic, A. Levering, L. Pennig, D. Oliveira, D. Marcos and X. Zhu.
Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes.
CVPR 2024 - Workshop at the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Predicting socioeconomic indicators from satellite imagery with deep learning has become an increasingly popular research direction. Post-hoc concept-based explanations can be an important step towards broader adoption of these models in policy-making as they enable the interpretation of socioeconomic outcomes based on visual concepts that are intuitive to humans. In this paper, we study the interplay between representation learning using an additional task-specific contrastive loss and post-hoc concept explainability for socioeconomic studies. Our results on two different geographical locations and tasks indicate that the task-specific pretraining imposes a continuous ordering of the latent space embeddings according to the socioeconomic outcomes. This improves the model’s interpretability as it enables the latent space of the model to associate urban concepts with continuous intervals of socioeconomic outcomes. Further, we illustrate how analyzing the model’s conceptual sensitivity for the intervals of socioeconomic outcomes can shed light on new insights for urban studies.

MCML Authors

Ivica Obadic

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Lars Pennig

Ethics in Systems Design and Machine Learning

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[1002]

C. Reich, B. Debnath, D. Patel, T. Prangemeier, D. Cremers and S. Chakradhar.
Deep Video Codec Control for Vision Models.
CVPR 2024 - Workshop at the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t. human quality assessment. We demonstrate empirically that standard-coded videos vastly deteriorate the performance of deep vision models. To overcome the deterioration of vision performance, this paper presents the first end-to-end learnable deep video codec control that considers both bandwidth constraints and downstream deep vision performance, while adhering to existing standardization. We demonstrate that our approach better preserves downstream deep vision performance than traditional standard video coding.

MCML Authors

Christoph Reich

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[1001]

C. Reich, O. Hahn, D. Cremers, S. Roth and B. Debnath.
A Perspective on Deep Vision Performance with Standard Image and Video Codecs.
CVPR 2024 - Workshop at the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and required to ensure interoperability. This paper aims to examine the implications of employing standardized codecs within deep vision pipelines. We find that using JPEG and H.264 coding significantly deteriorates the accuracy across a broad range of vision tasks and models. For instance, strong compression rates reduce semantic segmentation accuracy by more than 80% in mIoU. In contrast to previous findings, our analysis extends beyond image and action classification to localization and dense prediction tasks, thus providing a more comprehensive perspective.

MCML Authors

Christoph Reich

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Vision & Artificial Intelligence

[1000]

H. Ye, Y. Liu, C. Ma and H. Schütze.
MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer.
NAACL 2024 - 5th Workshop on Insights from Negative Results in NLP at the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

Transformer-based pre-trained language models (PLMs) have achieved remarkable performance in various natural language processing (NLP) tasks. However, pre-training such models can take considerable resources that are almost only available to high-resource languages. On the contrary, static word embeddings are easier to train in terms of computing resources and the amount of data required. In this paper, we introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer), a novel and challenging task that is especially relevant to low-resource languages for which static word embeddings are available. To tackle the task, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language. In this way, we can train the PLM on source-language training data and perform zero-shot transfer to the target language by simply swapping the embedding layer. However, through extensive experiments on two classification datasets, we show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines. In this paper, we attempt to explain this negative result and provide several thoughts on possible improvement.

MCML Authors

Haotian Ye

Computational Linguistics

Yihong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Chunlan Ma

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[999]

B. Deiseroth, M. Meuer, N. Gritsch, C. Eichenberg, P. Schramowski, M. Aßenmacher and K. Kersting.
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. DOI

Abstract

Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. However, their ever-increasing size has raised concerns about their effective deployment and the need for LLM compression. This study introduces the Divergent Token Metrics (DTMs), a novel approach to assessing compressed LLMs, addressing the limitations of traditional perplexity or accuracy measures that fail to accurately reflect text generation quality. DTMs measure token divergences that allow deeper insights into the subtleties of model compression, in particular, when evaluating components’ impacts individually. Utilizing the First Divergent Token Metric (FDTM) in model sparsification reveals that 25% of all attention components can be pruned beyond 90% on the Llama-2 model family, still keeping SOTA performance. For quantization, FDTM suggests that more than 80% of parameters can be naively transformed to int8 without special outlier management. These evaluations indicate the necessity of choosing appropriate compressions for parameters individually—and that FDTM can identify those—while standard metrics result in deteriorated outcomes.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[998]

Z. Ding, H. Cai, J. Wu, Y. Ma, R. Liao, B. Xiong and V. Tresp.
zrLLM: Zero-Shot Relational Learning on Temporal Knowledge Graphs with Large Language Models.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

Modeling evolving knowledge over temporal knowledge graphs (TKGs) has become a heated topic. Various methods have been proposed to forecast links on TKGs. Most of them are embedding-based, where hidden representations are learned to represent knowledge graph (KG) entities and relations based on the observed graph contexts. Although these methods show strong performance on traditional TKG forecasting (TKGF) benchmarks, they face a strong challenge in modeling the unseen zero-shot relations that have no prior graph context. In this paper, we try to mitigate this problem as follows. We first input the text descriptions of KG relations into large language models (LLMs) for generating relation representations, and then introduce them into embedding-based TKGF methods. LLM-empowered representations can capture the semantic information in the relation descriptions. This makes the relations, whether seen or unseen, with similar semantic meanings stay close in the embedding space, enabling TKGF models to recognize zero-shot relations even without any observed graph context. Experimental results show that our approach helps TKGF models to achieve much better performance in forecasting the facts with previously unseen relations, while still maintaining their ability in link forecasting regarding seen relations.

MCML Authors

Zifeng Ding

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Ruotong Liao

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[997]

R. Liao, X. Jia, Y. Li, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL GitHub

Abstract

The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional embedding-based and rule-based methods dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval-augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and few-shot parameter-efficient instruction tuning to solve the above challenges, respectively. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting with low computation resources using extremely limited training data as few as 16 samples. GenTKG also highlights remarkable cross-domain generalizability with outperforming performance on unseen datasets without re-training, and in-domain generalizability regardless of time split in the same dataset. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.

MCML Authors

Ruotong Liao

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining

[996]

M. Wang, H. Adel, L. Lange, J. Strötgen and H. Schütze.
Rehearsal-Free Modular and Compositional Continual Learning for Language Models.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

Continual learning aims at incrementally acquiring new knowledge while not forgetting existing knowledge. To overcome catastrophic forgetting, methods are either rehearsal-based, i.e., store data examples from previous tasks for data replay, or isolate parameters dedicated to each task. However, rehearsal-based methods raise privacy and memory issues, and parameter-isolation continual learning does not consider interaction between tasks, thus hindering knowledge transfer. In this work, we propose MoCL, a rehearsal-free Modular and Compositional Continual Learning framework which continually adds new modules to language models and composes them with existing modules. Experiments on various benchmarks show that MoCL outperforms state of the art and effectively facilitates knowledge transfer.

MCML Authors

Mingyang Wang

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[995]

Y. Liu, P. Lin, M. Wang and H. Schütze.
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining.
NAACL 2024 - Findings of Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary extension and continued pretraining. However, this method usually randomly initializes the embeddings of new subwords and introduces substantially more embedding parameters to the model, thus weakening the efficiency. To address these issues, we propose a novel framework: One For All (OFA), which wisely initializes the embeddings of unseen subwords and thus can adapt a PLM to multiple languages efficiently and effectively. OFA takes advantage of external well-aligned multilingual static word vectors and injects the alignment knowledge into the subword embeddings. In addition, OFA applies matrix factorization and replaces the cumbersome embeddings with two lower-dimensional matrices, which largely reduces the number of parameters. We show OFA accelerates the convergence of continued pretraining, which is environmentally friendly as much fewer carbon footprints are generated. Through extensive experiments, we demonstrate OFA can achieve competitive or better performance than default continued pretraining baselines on a wide range of crosslingual downstream tasks. We make our code and models publicly available.

MCML Authors

Yihong Liu

Computational Linguistics

Peiqin Lin

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Mingyang Wang

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Computational Linguistics

[994]

P. Resnik, B. Ma, A. Hoyle, P. Goel, R. Sarkar, M. Gearing, A.-C. Haensch and F. Kreuter.
TOPCAT: Topic-Oriented Protocol for Content Analysis of Text – A Preliminary Study.
NLP+CSS @NAACL 2024 - 6th Workshop on Natural Language Processing and Computational Social Science at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

Identifying constructs in text data is a labor-intensive task in social science research. Despite the potential richness of open-ended survey responses, the complexity of analyzing them often leads researchers to underutilize or ignore them entirely. While topic modeling offers a technological solution, qualitative researchers may remain skeptical of its rigor. In this paper, we introduce TOPCAT: Topic-Oriented Protocol for Content Analysis of Text, a systematic approach that integrates off-the-shelf topic modeling with human decisionmaking and curation. Our method aims to provide a viable solution for topicalizing open-ended responses in survey research, ensuring both efficiency and trustworthiness. We present the TOPCAT protocol, define an evaluation process, and demonstrate its effectiveness using open-ended responses from a U.S. survey on COVID-19 impact. Our findings suggest that TOPCAT enables efficient and rigorous qualitative analysis, offering a promising avenue for future research in this domain. Furthermore, our findings challenge the adequacy of expert coding schemes as ‘‘gold’’ standards, emphasizing the subjectivity inherent in qualitative content interpretation.

MCML Authors

Bolei Ma

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[993]

H. Chen, J. Büssing, D. Rügamer and E. Nie.
Leveraging (Sentence) Transformer Models with Contrastive Learning for Identifying Machine-Generated Text.
SemEval @NAACL 2024 - 18th International Workshop on Semantic Evaluation at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

This paper outlines our approach to SemEval-2024 Task 8 (Subtask B), which focuses on discerning machine-generated text from human-written content, while also identifying the text sources, i.e., from which Large Language Model (LLM) the target text is generated. Our detection system is built upon Transformer-based techniques, leveraging various pre-trained language models (PLMs), including sentence transformer models. Additionally, we incorporate Contrastive Learning (CL) into the classifier to improve the detecting capabilities and employ Data Augmentation methods. Ultimately, our system achieves a peak accuracy of 76.96% on the test set of the competition, configured using a sentence transformer model integrated with CL methodology.

MCML Authors

David Rügamer

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Statistics, Data Science and Machine Learning

Ercong Nie

Computational Linguistics

[992]

S. Zhou, H. Shan, B. Plank and R. Litschko.
MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness.
SemEval @NAACL 2024 - 18th International Workshop on Semantic Evaluation at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences from the same languages. For cross-lingual approach we developed a set of linguistics-inspired models trained with several task-specific strategies. We 1) utilize language vectors for selection of donor languages; 2) investigate the multi-source approach for training; 3) use transliteration of non-latin script to study impact of ‘script gap’; 4) opt machine translation for data augmentation. We additionally compare the performance of XLM-RoBERTa and Furina with the same training strategy. Our submission achieved the first place in the C8 (Kinyarwanda) test.

MCML Authors

Shijia Zhou

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Robert Litschko

AI and Computational Linguistics

[991]

Y. Zhang, V. Hangya and A. Fraser.
A Study of the Class Imbalance Problem in Abusive Language Detection.
WOAH @NAACL 2024 - 8th Workshop on Online Abuse and Harms at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. DOI

Abstract

Abusive language detection has drawn increasing interest in recent years. However, a less systematically explored obstacle is label imbalance, i.e., the amount of abusive data is much lower than non-abusive data, leading to performance issues. The aim of this work is to conduct a comprehensive comparative study of popular methods for addressing the class imbalance issue. We explore 10 well-known approaches on 8 datasets with distinct characteristics: binary or multi-class, moderately or largely imbalanced, focusing on various types of abuse, etc. Additionally, we pro-pose two novel methods specialized for abuse detection: AbusiveLexiconAug and ExternalDataAug, which enrich the training data using abusive lexicons and external abusive datasets, respectively. We conclude that: 1) our AbusiveLexiconAug approach, random oversampling, and focal loss are the most versatile methods on various datasets; 2) focal loss tends to yield peak model performance; 3) oversampling and focal loss provide promising results for binary datasets and small multi-class sets, while undersampling and weighted cross-entropy are more suitable for large multi-class sets; 4) most methods are sensitive to hyperparameters, yet our suggested choice of hyperparameters provides a good starting point.

MCML Authors

Viktor Hangya

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

* Former Member

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[990]

L. Haliburton, S. Ghebremedhin, R. Welsch, A. Schmidt and S. Mayer.
Investigating Labeler Bias in Face Annotation for Machine Learning.
HHAI 2024 - 3rd International Conference on Hybrid Human-Artificial Intelligence. Malmö, Sweden, Jun 10-14, 2024. DOI

Abstract

In a world increasingly reliant on artificial intelligence, it is more important than ever to consider the ethical implications of artificial intelligence. One key under-explored challenge is labeler bias — bias introduced by individuals who label datasets — which can create inherently biased datasets for training and subsequently lead to inaccurate or unfair decisions in healthcare, employment, education, and law enforcement. Hence, we conducted a study (N=98) to investigate and measure the existence of labeler bias using images of people from different ethnicities and sexes in a labeling task. Our results show that participants hold stereotypes that influence their decision-making process and that labeler demographics impact assigned labels. We also discuss how labeler bias influences datasets and, subsequently, the models trained on them. Overall, a high degree of transparency must be maintained throughout the entire artificial intelligence training process to identify and correct biases in the data as early as possible.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Human-Centered Ubiquitous Media

[989]

L. Mayer, C. Heumann and M. Aßenmacher.
Can OpenSource beat ChatGPT? - A Comparative Study of Large Language Models for Text-to-Code Generation.
SwissText 2024 - Swiss Text Analytics Conference. Chur, Switzerland, Jun 10-11, 2024. URL

Abstract

In recent years, large language models (LLMs) have emerged as powerful tools with potential applications in various fields, including software engineering. Within the scope of this research, we evaluate five different state-of-the-art LLMs - Bard, BingChat, ChatGPT, Llama2, and Code Llama - concerning their capabilities for text-to-code generation. In an empirical study, we feed prompts with textual descriptions of coding problems sourced from the programming website LeetCode to the models with the task of creating solutions in Python. Subsequently, the quality of the generated outputs is assessed using the testing functionalities of LeetCode. The results indicate large differences in performance between the investigated models. ChatGPT can handle these typical programming challenges by far the most effectively, surpassing even code-specialized models like Code Llama. To gain further insights, we measure the runtime as well as the memory usage of the generated outputs and compared them to the other code submissions on Leetcode. A detailed error analysis, encompassing a comparison of the differences concerning correct indentation and form of the generated code as well as an assignment of the incorrectly solved tasks to certain error categories allows us to obtain a more nuanced picture of the results and potential for improvement. The results also show a clear pattern of increasingly incorrect produced code when the models are facing a lot of context in the form of longer prompts.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[988]

C. Liu, C. M. Albrecht, Y. Wang and X. Zhu.
Task Specific Pretraining with Noisy Labels for Remote sensing Image Segmentation.
IGARSS 2024 - IEEE International Geoscience and Remote Sensing Symposium. Athens, Greece, Jun 07, 2024-12, 2023. DOI

Abstract

Compared to supervised deep learning, self-supervision provides remote sensing a tool to reduce the amount of exact, human-crafted geospatial annotations. While image-level information for unsupervised pretraining efficiently works for various classification downstream tasks, the performance on pixel-level semantic segmentation lags behind in terms of model accuracy. On the contrary, many easily available label sources (e.g., automatic labeling tools and land cover land use products) exist, which can provide a large amount of noisy labels for segmentation model training. In this work, we propose to exploit noisy semantic segmentation maps for model pretraining. Our experiments provide insights on robustness per network layer. The transfer learning settings test the cases when the pretrained encoders are fine-tuned for different label classes and decoders. The results from two datasets indicate the effectiveness of task-specific supervised pretraining with noisy labels. Our findings pave new avenues to improved model accuracy and novel pretraining strategies for efficient remote sensing image segmentation.

MCML Authors

Chenying Liu

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[987]

Z. Yuan, L. Mou;, Y. Hua and X. Zhu.
Referring Image Segmentation for Remote Sensing Data.
IGARSS 2024 - IEEE International Geoscience and Remote Sensing Symposium. Athens, Greece, Jun 07, 2024-12, 2023. DOI

Abstract

In this paper, we present a new task: referring image segmentation for remote sensing data, which targets segmenting out specific objects referred to by natural language. Due to the absence of a dataset for this task, we construct a dataset based on the SkyScapes dataset. Our dataset is designed with linguistically structured expressions that focus on object categories, attributes, and spatial relationships, enabling the generation of binary masks from semantic segmentation maps. To benchmark this task, we evaluate and compare the performance of three different convolutional neural network (CNN)-based methods and a Transformer-based method. Experimental results provide valuable insights into the adaptability of these methods to remote sensing data, highlighting the potential of our dataset as a resource for the remote sensing community to further explore vision-language tasks.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[986]

Q. Zhang, Y. Wang and X. Zhu.
Deep-Learning-Based Large-Scale Forest Height Generation.
IGARSS 2024 - IEEE International Geoscience and Remote Sensing Symposium. Athens, Greece, Jun 07, 2024-12, 2023. DOI

Abstract

The vegetation height has been identified as a key biophysical parameter to justify the role of forests in the carbon cycle and ecosystem productivity. Therefore, consistent and large-scale forest height is essential for managing terrestrial ecosystems, mitigating climate change, and preventing biodiversity loss. Since spaceborne multispectral instruments, Light Detection and Ranging (LiDAR), and Synthetic Aperture Radar (SAR) have been widely used for large-scale earth observation for years, this paper explores the possibility of generating largescale and high-accuracy forest heights with the synergy of the Sentinel-1, Sentinel-2, and ICESat-2 data. A Forest Height Generative Adversarial Network (FH-GAN) is developed to retrieve forest height from Sentinel-1 and Sentinel-2 images sparsely supervised by the ICESat-2 data. This model is made up of a cascade forest height and coherence generator, where the output of the forest height generator is fed into the spatial discriminator to regularize spatial details, and the coherence generator is connected to a coherence discriminator to refine the vertical details. A progressive strategy further underpins the generator to boost the accuracy of multi-source forest height estimation. Results indicated that FH-GAN achieves the best RMSE of 2.10 m at a large scale compared with the LVIS reference and the best RMSE of 6.16 m compared with the ICESat-2 reference.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[985]

J. W. Grootjen, H. Weingärtner and S. Mayer.
Investigating the Effects of Eye-Tracking Interpolation Methods on Model Performance of LSTM.
PETMEI @ETRA 2024 - 9th International Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction at the ACM Symposium on Eye Tracking Research and Applications (ETRA 2024). Glasgow, Scotland, Jun 04-07, 2024. DOI

Abstract

Physiological sensing enables us to use advanced adaptive functionalities through physiological data (e.g., eye tracking) to change conditions. In this work, we investigate the impact of infilling methods on LSTM models’ performance in handling missing eye tracking data, specifically during blinks and gaps in recording. We conducted experiments using recommended infilling techniques from previous work on an openly available eye tracking dataset and LSTM model structure. Our findings indicate that the infilling method significantly influences LSTM prediction accuracy. These results underscore the importance of standardized infilling approaches for enhancing the reliability and reproducibility of LSTM-based eye tracking applications on a larger scale. Future work should investigate the impact of these infilling methods in larger datasets to investigate generalizability.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

* Former Member

[984]

S. Jaime and C. Kern.
Ethnic Classifications in Algorithmic Fairness: Concepts, Measures and Implications in Practice.
ACM FAccT 2024 - 7th ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI

Abstract

We address the challenges and implications of ensuring fairness in algorithmic decision-making (ADM) practices related to ethnicity. Expanding beyond the U.S.-centric approach to race, we provide an overview of ethnic classification schemes in European countries and emphasize how the distinct approaches to ethnicity in Europe can impact fairness assessments in ADM. Drawing on large-scale German survey data, we highlight differences in ethnic disadvantage across subpopulations defined by different measures of ethnicity. We build prediction models in the labor market, health, and finance domain and investigate the fairness implications of different ethnic classification schemes across multiple prediction tasks and fairness metrics. Our results show considerable variation in fairness scores across ethnic classifications, where error disparities for the same model can be twice as large when using different operationalizations of ethnicity. We argue that ethnic classifications differ in their ability to identify ethnic disadvantage across ADM domains and advocate for context-sensitive operationalizations of ethnicity and its transparent reporting in fair machine learning (ML) applications.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

[983]

J. Simson, A. Fabris and C. Kern.
Lazy Data Practices Harm Fairness Research.
ACM FAccT 2024 - 7th ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI

Abstract

Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a lack of representation for certain protected attributes in both data and evaluations, (2) the widespread exclusion of minorities during data preprocessing, and (3) a lack of transparency about consequential yet overlooked dataset processing choices. We further note additional factors, such as limitations in publicly available data, privacy considerations and a general lack of awareness that further contribute to these issues. Through exemplary analyses on the usage of popular datasets, we demonstrate how opaque data choices significantly impact minorities, fairness metrics, and the resulting model comparison. To address these challenges, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.

MCML Authors

Jan Simson

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

[982]

J. Simson, F. Pfisterer and C. Kern.
One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions.
ACM FAccT 2024 - 7th ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI

Abstract

A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. The downstream effects of ADM systems critically depend on the decisions made during a systems’ design, implementation, and evaluation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these decisions are made implicitly, without knowing exactly how they will influence the final system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit decisions during design and evaluation into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible “universes” of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can investigate the variability and robustness of fairness scores and see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand fairness implications of design and evaluation decisions using an exemplary case study of predicting public health care coverage for vulnerable populations. Our results highlight how decisions regarding the evaluation of a system can lead to vastly different fairness metrics for the same model. This is problematic, as a nefarious actor could optimise or “hack” a fairness metric to portray a discriminating model as fair merely by changing how it is evaluated. We illustrate how a multiverse analysis can help to address this issue.

MCML Authors

Jan Simson

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[981]

R. S. Oyamada, G. M. Tavares, S. B. Junior and P. Ceravolo.
Enhancing Predictive Process Monitoring with Time-Related Feature Engineering.
CAiSE 2024 - 36th International Conference on Advanced Information Systems Engineering. Limassol, Cyprus, Jun 03-07, 2024. DOI

Abstract

Predictive process monitoring plays a critical role in process mining by predicting the dynamics of ongoing processes. Recent trends employ deep learning techniques that use event sequences to make highly accurate predictions. However, this focus often overshadows the significant advantages of lightweight, transparent algorithms. This study explores the potential of traditional regression algorithms, namely kNN, SVM, and RF, enhanced by event time feature engineering. We integrate existing and novel time-related features to augment these algorithms and compare their performance against the well-known LSTM network. Our results show that these enhanced lightweight models not only compete with LSTM in terms of predictive accuracy but also excel in scenarios requiring online, real-time decision-making and explanation. Furthermore, despite incorporating additional feature extraction processes, these algorithms maintain superior computational efficiency compared to their deep learning counterparts, making them more viable for time-critical and resource-constrained environments.

MCML Authors

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining

[980]

B. Säfken and D. Rügamer.
Editorial special issue: Bridging the gap between AI and Statistics.
Advances in Statistical Analysis 108 (Jun. 2024). DOI

Abstract

This special issue aims to serve as a nexus for this vital interdisciplinary exchange. From theoretical advancements to innovative applications, this issue seeks to illuminate the synergy between AI and statistics and pave the way for a new era of discovery and innovation at the confluence of AI and statistics.

MCML Authors

David Rügamer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Statistics, Data Science and Machine Learning

[979]

A. Scagliotti and P. Colli Franzone.
A subgradient method with constant step-size for l1-composite optimization.
Bollettino dell’Unione Matematica Italiana 17 (Jun. 2024). DOI

Abstract

Subgradient methods are the natural extension to the non-smooth case of the classical gradient descent for regular convex optimization problems. However, in general, they are characterized by slow convergence rates, and they require decreasing step-sizes to converge. In this paper we propose a subgradient method with constant step-size for composite convex objectives with -regularization. If the smooth term is strongly convex, we can establish a linear convergence result for the function values. This fact relies on an accurate choice of the element of the subdifferential used for the update, and on proper actions adopted when non-differentiability regions are crossed. Then, we propose an accelerated version of the algorithm, based on conservative inertial dynamics and on an adaptive restart strategy, that is guaranteed to achieve a linear convergence rate in the strongly convex case. Finally, we test the performances of our algorithms on some strongly and non-strongly convex examples.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[978]

K. Göbler, M. Drton, S. Mukherjee and A. Miloschewski.
High-dimensional undirected graphical models for arbitrary mixed data.
Electronic Journal of Statistics 18.1 (Jun. 2024). DOI

Abstract

Graphical models are an important tool in exploring relationships between variables in complex, multivariate data. Methods for learning such graphical models are well-developed in the case where all variables are either continuous or discrete, including in high dimensions. However, in many applications, data span variables of different types (e.g., continuous, count, binary, ordinal, etc.), whose principled joint analysis is nontrivial. Latent Gaussian copula models, in which all variables are modeled as transformations of underlying jointly Gaussian variables, represent a useful approach. Recent advances have shown how the binary-continuous case can be tackled, but the general mixed variable type regime remains challenging. In this work, we make the simple but useful observation that classical ideas concerning polychoric and polyserial correlations can be leveraged in a latent Gaussian copula framework. Building on this observation, we propose a flexible and scalable methodology for data with variables of entirely general mixed type. We study the key properties of the approaches theoretically and empirically.

MCML Authors

Mathias Drton

Prof. Dr.

Mathematical Statistics

[977]

Ç. Yapar, F. Jaensch, R. Levie, G. Kutyniok and G. Caire.
Overview of the First Pathloss Radio Map Prediction Challenge.
IEEE Open Journal of Signal Processing 5 (Jun. 2024). DOI

Abstract

Pathloss quantifies the reduction in power density of a signal radiated from a transmitter. The attenuation is due to large-scale effects such as free-space propagation loss and interactions (e.g., penetration, reflection, and diffraction) of the signal with objects such as buildings, vehicles, trees, and pedestrians in the propagation environment. Many current or planned wireless communications applications require the knowledge (or a reliable approximation) of the pathloss on a dense grid (radio map) of the environment of interest. Deterministic simulation methods such as ray tracing are known to provide very good estimates of pathloss values. However, their high computational complexity makes them unsuitable for most of the applications envisaged. To promote research and facilitate a fair comparison among the recently proposed fast and accurate deep learning-based pathloss radio map prediction methods, we have organized the ICASSP 2023 First Pathloss Radio Map Prediction Challenge. In this overview paper, we describe the pathloss radio map prediction problem, provide a literature survey of the current state of the art, describe the challenge datasets, the challenge task, and the challenge evaluation methodology. Finally, we provide a brief overview of the submitted methods and present the results of the challenge.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[976]

J. Guo, D. Hong, Z. Liu and X. Zhu.
Continent-wide urban tree canopy fine-scale mapping and coverage assessment in South America with high-resolution satellite images.
ISPRS Journal of Photogrammetry and Remote Sensing 212 (Jun. 2024). DOI

Abstract

Urban development in South America has experienced significant growth and transformation over the past few decades. South America’s urban development and trees are closely interconnected, and tree cover within cities plays a vital role in shaping sustainable and resilient urban landscapes. However, knowledge of urban tree canopy (UTC) coverage in the South American continent remains limited. In this study, we used high-resolution satellite images and developed a semi-supervised deep learning method to create UTC data for 888 South American cities. The proposed semi-supervised method can leverage both labeled and unlabeled data during training. By incorporating labeled data for guidance and utilizing unlabeled data to explore underlying patterns, the algorithm enhances model robustness and generalization for urban tree canopy detection across South America, with an average overall accuracy of 94.88% for the tested cities. Based on the created UTC products, we successfully assessed the UTC coverage for each city. Statistical results showed that the UTC coverage in South America is between 0.76% and 69.53%, and the average UTC coverage is approximately 19.99%. Among the 888 cities, only 357 cities that accommodate approximately 48.25% of the total population have UTC coverage greater than 20%, while the remaining 531 cities that accommodate approximately 51.75% of the total population have UTC coverage less than 20%. Natural factors (climatic and geographical) play a very important role in determining UTC coverage, followed by human activity factors (economy and urbanization level). We expect that the findings of this study and the created UTC dataset will help formulate policies and strategies to promote sustainable urban forestry, thus further improving the quality of life of residents in South America.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[975]

W. Huang, Y. Shi, Z. Xiong and X. Zhu.
Decouple and weight semi-supervised semantic segmentation of remote sensing images.
ISPRS Journal of Photogrammetry and Remote Sensing 212 (Jun. 2024). DOI GitHub

Abstract

Semantic understanding of high-resolution remote sensing (RS) images is of great value in Earth observation, however, it heavily depends on numerous pixel-wise manually-labeled data, which is laborious and thereby limits its practical application. Semi-supervised semantic segmentation (SSS) of RS images would be a promising solution, which uses both limited labeled data and dominant unlabeled data to train segmentation models, significantly mitigating the annotation burden. The current mainstream methods of remote sensing semi-supervised semantic segmentation (RS-SSS) utilize the hard or soft pseudo-labels of unlabeled data for model training and achieve excellent performance. Nevertheless, their performance is bottlenecked because of two inherent problems: irreversible wrong pseudo-labels and long-tailed distribution among classes. Aiming at them, we propose a decoupled weighting learning (DWL) framework for RS-SSS in this study, which consists of two novel modules, decoupled learning and ranking weighting, corresponding to the above two problems, respectively. During training, the decoupled learning module separates the predictions of the labeled and unlabeled data to decrease the negative impact of the self-training of the wrongly pseudo-labeled unlabeled data on the supervised training of the labeled data. Furthermore, the ranking weighting module tries to adaptively weight each pseudo-label of the unlabeled data according to its relative confidence ranking in its pseudo-class to alleviate model bias to majority classes as a result of the long-tailed distribution. To verify the effectiveness of the proposed DWL framework, extensive experiments are conducted on three widely-used RS semantic segmentation datasets in the semi-supervised setting. The experimental results demonstrate the superiority of our method to some state-of-the-art SSS methods.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[974]

S. Haas and E. Hüllermeier.
Conformalized prescriptive machine learning for uncertainty-aware automated decision making: the case of goodwill requests.
International Journal of Data Science and Analytics (Jun. 2024). DOI

Abstract

Due to the inherent presence of uncertainty in machine learning (ML) systems, the usage of ML is until now out of scope for many critical (financial) business processes. One such process is goodwill assessment at car manufacturers, where a large part of goodwill cases is still assessed manually by human experts. To increase the degree of automation while still providing an overall reliable assessment service, we propose a selective uncertainty-aware automated decision making approach based on uncertainty quantification through conformal prediction. In our approach, goodwill requests are still shifted to human experts in case the risk of a wrong assessment is too high. Nevertheless, ML can be introduced into the process with reduced and controllable risk. We hereby determine the risk of wrong ML assessments through two hierarchical conformal predictors that make use of the prediction set and interval size as the main criteria for quantifying uncertainty. We also utilize conformal prediction’s property to output empty prediction sets if no prediction is significant enough and abstain from an automatic decision in that case. Instead of providing mathematical guarantees for limited risk, we focus on the risk vs. degree of automation trade-off and how a business decision maker can select in an a posteriori fashion a trade-off that best suits the business problem at hand from a set of pareto optimal solutions. We also show empirically on a goodwill data set of a BMW National Sales Company that by only selecting certain requests for automated decision making we can significantly increase the accuracy of automatically processed requests. For instance, from 92 to 98% for labor and from 90 to 98% for parts contributions respectively, while still maintaining a degree of automation of approximately 70%.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[973]

S. M. Fischer, J. Kiechle, D. M. Lang, J. C. Peeken and J. A. Schnabel.
Mask the Unknown: Assessing Different Strategies to Handle Weak Annotations in the MICCAI2023 Mediastinal Lymph Node Quantification Challenge.
Machine Learning for Biomedical Imaging 2 (Jun. 2024). DOI GitHub

Abstract

Pathological lymph node delineation is crucial in cancer diagnosis, progression assessment, and treatment planning. The MICCAI 2023 Lymph Node Quantification Challenge published the first public dataset for pathological lymph node segmentation in the mediastinum. As lymph node annotations are expensive, the challenge was formed as a weakly supervised learning task, where only a subset of all lymph nodes in the training set have been annotated. For the challenge submission, multiple methods for training on these weakly supervised data were explored, including noisy label training, loss masking of unlabeled data, and an approach that integrated the TotalSegmentator toolbox as a form of pseudo labeling in order to reduce the number of unknown voxels. Furthermore, multiple public TCIA datasets were incorporated into the training to improve the performance of the deep learning model. Our submitted model achieved a Dice score of 0.628 and an average symmetric surface distance of 5.8~mm on the challenge test set. With our submitted model, we accomplished the third rank in the MICCAI2023 LNQ challenge. A finding of our analysis was that the integration of all visible, including non-pathological lymph nodes improved the overall segmentation performance on pathological lymph nodes of the test set. Furthermore, segmentation models trained only on clinically enlarged lymph nodes, as given in the challenge scenario, could not generalize to smaller pathological lymph nodes.

MCML Authors

Johannes Kiechle

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[972]

R. Wicklein, L. Kreitner, A. Wild, L. Aly, D. Rückert, B. Hemmer, T. Korn, M. Menten and B. Knier.
Retinal small vessel pathology is associated with disease burden in multiple sclerosis.
Multiple Sclerosis Journal 30.7 (Jun. 2024). DOI

Abstract

Background: Alterations of the superficial retinal vasculature are commonly observed in multiple sclerosis (MS) and can be visualized through optical coherence tomography angiography (OCTA).
Objectives: This study aimed to examine changes in the retinal vasculature during MS and to integrate findings into current concepts of the underlying pathology.
Methods: In this cross-sectional study, including 259 relapsing–remitting MS patients and 78 healthy controls, we analyzed OCTAs using deep-learning-based segmentation algorithm tools.
Results: We identified a loss of small-sized vessels (diameter < 10 µm) in the superficial vascular complex in all MS eyes, irrespective of their optic neuritis (ON) history. This alteration was associated with MS disease burden and appears independent of retinal ganglion cell loss. In contrast, an observed reduction of medium-sized vessels (diameter 10–20 µm) was specific to eyes with a history of ON and was closely linked to ganglion cell atrophy.
Conclusion: These findings suggest distinct atrophy patterns in retinal vessels in patients with MS. Further studies are necessary to investigate retinal vessel alterations and their underlying pathology in MS.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

[971]

A. Ziller, T. T. Mueller, S. Stieger, L. F. Feiner, J. Brandt, R. Braren, D. Rückert and G. Kaissis.
Reconciling privacy and accuracy in AI for medical imaging.
Nature Machine Intelligence 6 (Jun. 2024). DOI

Abstract

Artificial intelligence (AI) models are vulnerable to information leakage of their training data, which can be highly sensitive, for example, in medical imaging. Privacy-enhancing technologies, such as differential privacy (DP), aim to circumvent these susceptibilities. DP is the strongest possible protection for training models while bounding the risks of inferring the inclusion of training samples or reconstructing the original data. DP achieves this by setting a quantifiable privacy budget. Although a lower budget decreases the risk of information leakage, it typically also reduces the performance of such models. This imposes a trade-off between robust performance and stringent privacy. Additionally, the interpretation of a privacy budget remains abstract and challenging to contextualize. Here we contrast the performance of artificial intelligence models at various privacy budgets against both theoretical risk bounds and empirical success of reconstruction attacks. We show that using very large privacy budgets can render reconstruction attacks impossible, while drops in performance are negligible. We thus conclude that not using DP at all is negligent when applying artificial intelligence models to sensitive data. We deem our results to lay a foundation for further debates on striking a balance between privacy risks and model performance.

MCML Authors

Daniel Rückert

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Healthcare and Medicine

Georgios Kaissis

Dr.

* Former Member

[970]

D. Bär, F. Pierri, G. De Francisci Morales and S. Feuerriegel.
Systematic discrepancies in the delivery of political ads on facebook and instagram.
PNAS Nexus.pgae247 (Jun. 2024). DOI

Abstract

Political advertising on social media has become a central element in election campaigns. However, granular information about political advertising on social media was previously unavailable, thus raising concerns regarding fairness, accountability, and transparency in the electoral process. In this article, we analyze targeted political advertising on social media via a unique, large-scale dataset of over 80,000 political ads from Meta during the 2021 German federal election, with more than billion impressions. For each political ad, our dataset records granular information about targeting strategies, spending, and actual impressions. We then study (i) the prevalence of targeted ads across the political spectrum; (ii) the discrepancies between targeted and actual audiences due to algorithmic ad delivery; and (iii) which targeting strategies on social media attain a wide reach at low cost. We find that targeted ads are prevalent across the entire political spectrum. Moreover, there are considerable discrepancies between targeted and actual audiences, and systematic differences in the reach of political ads (in impressions-per-EUR) among parties, where the algorithm favor ads from populists over others.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence in Management

[969]

J. Ramjith, A. Bender, K. C. B. Roes and M. A. Jonker.
Recurrent events analysis with piece-wise exponential additive mixed models.
Statistical Modelling 24.3 (Jun. 2024). DOI

Abstract

Recurrent events analysis plays an important role in many applications, including the study of chronic diseases or recurrence of infections. Historically, many models for recurrent events have been variants of the Cox model. In this article we introduce and describe the application of the piece-wise exponential Additive Mixed Model (PAMM) for recurrent events analysis and illustrate how PAMMs can be used to flexibly model the dependencies in recurrent events data. Simulations confirm that PAMMs provide unbiased estimates as well as equivalence to the Cox model when proportional hazards are assumed. Applications to recurrence of staphylococcus aureus and malaria in children illustrate the estimation of seasonality, bivariate non-linear effects, multiple timescales and relaxation of the proportional hazards assumption via time-varying effects. The R package pammtools is extended to facilitate estimation and visualization of PAMMs for recurrent events data.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[968]

S. Ball, F. Kreuter and N. Panickssery.
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models.
Preprint (Jun. 2024). arXiv

Abstract

Conversational large language models are trained to refuse to answer harmful questions. However, emergent jailbreaking techniques can still elicit unsafe outputs, presenting an ongoing challenge for model alignment. To better understand how different jailbreak types circumvent safeguards, this paper analyses model activations on different jailbreak inputs. We find that it is possible to extract a jailbreak vector from a single class of jailbreaks that works to mitigate jailbreak effectiveness from other semantically-dissimilar classes. This may indicate that different kinds of effective jailbreaks operate via a similar internal mechanism. We investigate a potential common mechanism of harmfulness feature suppression, and find evidence that effective jailbreaks noticeably reduce a model’s perception of prompt harmfulness. These findings offer actionable insights for developing more robust jailbreak countermeasures and lay the groundwork for a deeper, mechanistic understanding of jailbreak dynamics in language models.

MCML Authors

Sarah Ball

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI

[967]

B. Felderer, L. Repke, W. Weber, J. Schweisthal and L. Bothmann.
Predicting the Validity and Reliability of Survey Questions.
Preprint (Jun. 2024). DOI

Abstract

The Survey Quality Predictor (SQP) is an open-access system to predict the quality, i.e., the reliability and validity, of survey questions based on the characteristics of the questions. The prediction is based on a meta-regression of many multitrait-multimethod (MTMM) experiments in which characteristics of the survey questions were systematically varied. The release of SQP 3.0 that is based on an expanded data base as compared to previous SQP versions raised the need for a new meta-regression. To find the best method for analyzing the complex data structure of SQP (e.g., the existence of various uncorrelated predictors), we compared four suitable machine learning methods in terms of their ability to predict both survey quality indicators: LASSO, elastic net, boosting and random forest. The article discusses the performance of the models and illustrates the importance of the individual item characteristics in the random forest model, which was chosen for SQP 3.0.

MCML Authors

Jonas Schweisthal

Artificial Intelligence in Management

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[966]

L. Hirlimann, S. Zhang, H. Schütze and P. Wicke.
Robustness Testing of Multi-Modal Models in Varied Home Environments for Assistive Robots.
Preprint (Jun. 2024). arXiv

Abstract

The development of assistive robotic agents to support household tasks is advancing, yet the underlying models often operate in virtual settings that do not reflect real-world complexity. For assistive care robots to be effective in diverse environments, their models must be robust and integrate multiple modalities. Consider a caretaker needing assistance in a dimly lit room or navigating around a newly installed glass door. Models relying solely on visual input might fail in low light, while those using depth information could avoid the door. This demonstrates the necessity for models that can process various sensory inputs. Our ongoing study evaluates state-of-the-art robotic models in the AI2Thor virtual environment. We introduce disturbances, such as dimmed lighting and mirrored walls, to assess their impact on modalities like movement or vision, and object recognition. Our goal is to gather input from the Geriatronics community to understand and model the challenges faced by practitioners.

MCML Authors

Lea Hirlimann

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Shengqiang Zhang

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Philipp Wicke

Dr.

Computational Linguistics

[965]

Á. F. Junquera and C. Kern.
From rules to forests: rule-based versus statistical models for jobseeker profiling.
Preprint (Jun. 2024). DOI

Abstract

Public employment services (PES) commonly apply profiling models to target labour market programs to jobseekers at risk of becoming long-term unemployed. Such allocation systems often codify institutional experiences in a set of profiling rules, whose predictive ability, however, is seldomly tested. We systematically evaluate the predictive performance of a rule-based profiling procedure currently implemented by the PES of Catalonia, Spain, in comparison to the performance of statistical models in predicting future long-term unemployment (LTU) episodes. Using comprehensive administrative data, we develop logit and machine learning models and evaluate their performance with respect to both discrimination and calibration. Compared to the current rule-based procedure of Catalonia, our machine learning models achieve greater discrimination ability and remarkable improvements in calibration. Particularly, our random forest model is able to accurately forecast LTU episodes and outperforms the rule-based model by offering robust predictions that perform well under stress tests. This paper presents the first performance comparison between a complex, currently implemented, rule-based approach and complex statistical profiling models. Our work illustrates the importance of assessing the calibration of profiling models and the potential of statistical tools to assist public employment offices in Spain.

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[964]

T. Kaufmann, J. Blüml, A. Wüst, Q. Delfosse, K. Kersting and E. Hüllermeier.
OCALM: Object-Centric Assessment with Language Models.
Preprint (Jun. 2024). arXiv

Abstract

Properly defining a reward signal to efficiently train a reinforcement learning (RL) agent is a challenging task. Designing balanced objective functions from which a desired behavior can emerge requires expert knowledge, especially for complex environments. Learning rewards from human feedback or using large language models (LLMs) to directly provide rewards are promising alternatives, allowing non-experts to specify goals for the agent. However, black-box reward models make it difficult to debug the reward. In this work, we propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for RL agents from natural language task descriptions. OCALM uses the extensive world-knowledge of LLMs while leveraging the object-centric nature common to many environments to derive reward functions focused on relational concepts, providing RL agents with the ability to derive policies from task descriptions.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[963]

V. Margraf, M. Wever, S. Gilhuber, G. M. Tavares, T. Seidl and E. Hüllermeier.
ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data.
Preprint (Jun. 2024). arXiv GitHub

Abstract

In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled, aiming to enhance learning algorithms’ efficiency and performance. Numerous such query strategies have been proposed and compared in the active learning literature. However, the community still lacks standardized benchmarks for comparing the performance of different query strategies. This particularly holds for the combination of query strategies with different learning algorithms into active learning pipelines and examining the impact of the learning algorithm choice. To close this gap, we propose ALPBench, which facilitates the specification, execution, and performance monitoring of active learning pipelines. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, ALPBench consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. To demonstrate its usefulness and broad compatibility with various learning algorithms and query strategies, we conduct an exemplary study evaluating 9 query strategies paired with 8 learning algorithms in 2 different settings.

MCML Authors

Valentin Margraf

Artificial Intelligence and Machine Learning

Marcel Wever

Dr.

* Former Member

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Eyke Hüllermeier

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence and Machine Learning

[962]

M. Wang, H. Adel, L. Lange, J. Strötgen and H. Schütze.
Learn it or Leave it: Module Composition and Pruning for Continual Learning.
Preprint (Jun. 2024). arXiv

Abstract

In real-world environments, continual learning is essential for machine learning models, as they need to acquire new knowledge incrementally without forgetting what they have already learned. While pretrained language models have shown impressive capabilities on various static tasks, applying them to continual learning poses significant challenges, including avoiding catastrophic forgetting, facilitating knowledge transfer, and maintaining parameter efficiency. In this paper, we introduce MoCL-P, a novel lightweight continual learning method that addresses these challenges simultaneously. Unlike traditional approaches that continuously expand parameters for newly arriving tasks, MoCL-P integrates task representation-guided module composition with adaptive pruning, effectively balancing knowledge integration and computational overhead. Our evaluation across three continual learning benchmarks with up to 176 tasks shows that MoCL-P achieves state-of-the-art performance and improves parameter efficiency by up to three times, demonstrating its potential for practical applications where resource requirements are constrained.

MCML Authors

Mingyang Wang

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[961]

T. Wollschläger, N. Kemper, L. Hetzel, J. Sommer and S. Günnemann.
Expressivity and Generalization: Fragment-Biases for Molecular GNNs.
Preprint (Jun. 2024). arXiv

Abstract

Although recent advances in higher-order Graph Neural Networks (GNNs) improve the theoretical expressiveness and molecular property predictive performance, they often fall short of the empirical performance of models that explicitly use fragment information as inductive bias. However, for these approaches, there exists no theoretic expressivity study. In this work, we propose the Fragment-WL test, an extension to the well-known Weisfeiler & Leman (WL) test, which enables the theoretic analysis of these fragment-biased GNNs. Building on the insights gained from the Fragment-WL test, we develop a new GNN architecture and a fragmentation with infinite vocabulary that significantly boosts expressiveness. We show the effectiveness of our model on synthetic and real-world data where we outperform all GNNs on Peptides and have 12% lower error than all GNNs on ZINC and 34% lower error than other fragment-biased models. Furthermore, we show that our model exhibits superior generalization capabilities compared to the latest transformer-based architectures, positioning it as a robust solution for a range of molecular modeling tasks.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Data Analytics & Machine Learning

[960]

E. Dorigatti.
Cancer immunotherapy design and analysis through discrete optimization, positive-unlabeled learning, and semi-structured regression models.
Dissertation 2024. DOI

Abstract

This thesis advances precision medicine by leveraging artificial intelligence to improve cancer immunotherapy development and tackle key challenges in clinical trials, where high failure rates often stem from insufficient understanding of patient and disease-specific factors. Through novel computational frameworks for cancer vaccine design, methods for handling imbalanced biological data, and hybrid modeling techniques that combine clinical data with imaging, this work demonstrates AI’s potential to personalize and accelerate therapeutic development. These contributions collectively pave the way for more effective, targeted treatments, potentially reducing the time and cost to bring new therapies to market. (Shortened).

MCML Authors

Emilio Dorigatti

Dr.

* Former Member

[959]

J. Kiechle, S. M. Fischer, D. M. Lang, M. Folco, S. C. Foreman, V. K. N. Rösner, A.-K. Lohse, C. Mogler, C. Knebel, M. R. Makowski, K. Woertler, S. E. Combs, A. S. Gersing, J. C. Peeken and J. A. Schnabel.
Unifying local and global shape descriptors to grade soft-tissue sarcomas using graph convolutional networks.
ISBI 2024 - IEEE 21st International Symposium on Biomedical Imaging. Athens, Greece, May 27-30, 2024. DOI

Abstract

The tumor grading of patients suffering from soft-tissue sarcomas is a critical task, as an accurate classification of this high-mortality cancer entity constitutes a decisive factor in devising optimal treatment strategies. In this work, we focus on distinguishing soft-tissue sarcoma subtypes solely based on their 3D morphological characteristics, derived from tumor segmentation masks. Notably, we direct attention to overcoming the limitations of texture-based methodologies, which often fall short of providing adequate shape delineation. To this end, we propose a novel yet elegant modular geometric deep learning framework coined Global Local Graph Convolutional Network (GloLo-GCN) that integrates local and global shape characteristics into a meaningful unified shape descriptor. Evaluated on a multi-center dataset, our proposed model performs better in soft-tissue sarcoma grading than GCNs based on state-of-the-art graph convolutions and a volumetric 3D convolutional neural network, also evaluated on binary segmentation masks exclusively.

MCML Authors

Johannes Kiechle

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[958]

N. Stolt-Ansó, V. Sideri-Lampretsa, M. Dannecker and D. Rückert.
Intensity-based 3D motion correction for cardiac MR images.
ISBI 2024 - IEEE 21st International Symposium on Biomedical Imaging. Athens, Greece, May 27-30, 2024. DOI

Abstract

Cardiac magnetic resonance (CMR) image acquisition requires subjects to hold their breath while 2D cine images are acquired. This process assumes that the heart remains in the same position across all slices. However, differences in breathhold positions or patient motion introduce 3D slice misalignments. In this work, we propose an algorithm that simultaneously aligns all SA and LA slices by maximizing the pair-wise intensity agreement between their intersections. Unlike previous works, our approach is formulated as a subject-specific optimization problem and requires no prior knowledge of the underlying anatomy. We quantitatively demonstrate that the proposed method is robust against a large range of rotations and translations by synthetically misaligning 10 motion-free datasets and aligning them back using the proposed method.

MCML Authors

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[957]

Y. Zhang, N. Stolt-Ansó, J. Pan, W. Huang, K. Hammernik and D. Rückert.
Direct Cardiac Segmentation from Undersampled K-Space using Transformers.
ISBI 2024 - IEEE 21st International Symposium on Biomedical Imaging. Athens, Greece, May 27-30, 2024. DOI

Abstract

The prevailing deep learning-based methods of predicting cardiac segmentation involve reconstructed magnetic resonance (MR) images. The heavy dependency of segmentation approaches on image quality significantly limits the acceleration rate in fast MR reconstruction. Moreover, the practice of treating reconstruction and segmentation as separate sequential processes leads to artifact generation and information loss in the intermediate stage. These issues pose a great risk to achieving high-quality outcomes. To leverage the redundant k-space information overlooked in this dual-step pipeline, we introduce a novel approach to directly deriving segmentations from sparse k-space samples using a transformer (DiSK). DiSK operates by globally extracting latent features from 2D+time k-space data with attention blocks and subsequently predicting the segmentation label of query points. We evaluate our model under various acceleration factors (ranging from 4 to 64) and compare against two image-based segmentation baselines. Our model consistently outperforms the baselines in Dice and Hausdorff distances across foreground classes for all presented sampling rates.

MCML Authors

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[956]

P. Purucker, C. Reil, A. Höß and B. W. Schuller.
Deep Neural Quality of Service Prediction for Unmanned Aircraft System Communications.
IWCMC 2024 - 20th International Wireless Communications and Mobile Computing Conference. Cyprus, Greece, May 27-31, 2024. DOI

Abstract

Commercial Unmanned Aircraft Systems (UAS) have a wide range of applications, including package delivery, inspection and search and rescue missions. For the operation of Unmanned Aircraft Vehicles (UAV) Beyond Visual Line of Sight (BVLOS), reliable long-range communication is essential. The cellular network is one possible solution, but there are issues such as signal loss and frequent handovers at higher altitudes. To mitigate these issues, our work proposes the use of two cellular links from different providers prioritised according to Quality of Service (QoS) prediction. We evaluate multiple AI-based model architectures for the prediction, and find that the model consisting of Gated Recurrent Units (GRU) and convolutional layers outperforms the others. The models are trained and tested on real-world data and show a reduction in latency peaks, thereby increasing connection resilience. Moreover, the prediction pipeline is designed to be executable on the UAV side and is not limited to a specific geographical area, making it applicable to real-world scenarios. Finally, we present a pre-flight path planning algorithm that takes QoS into account when calculating the flight path in order to further improve communication. To support the research community, we publicly share the dataset used to obtain our results.

MCML Authors

Björn Schuller

Prof. Dr.

Health Informatics

[955]

Y. Han, Z. Ding, Y. Liu, B. He and V. Tresp.
Critical Path Identification in Supply Chain Knowledge Graphs with Large Language Models.
ESWC 2024 - Extended Semantic Web Conference. Hersonissos, Crete, Greece, May 26-30, 2024. DOI

Abstract

In the ever-evolving landscape of global commerce, supply chain management (SCM) has gained increasing significance. An important task in SCM is to find critical supply chain paths for a target company because these paths often represent potential bottlenecks in supply networks and thus could be crucial to risk management. The mainstream solution to this task requires supply chain managers to manually review supply chain data to uncover critical paths, resulting in considerable human labor costs. To better study SCM, recent efforts have been made to construct supply chain knowledge graphs (KGs) that connect supply chain-related data from different sources, facilitating the identification of critical paths through KG reasoning. In this paper, we develop an automated approach for critical path identification (CPI) based on supply chain KGs. We encode supply chain KGs into text and use large language models (LLMs) for CPI. LLMs can not only analyze the topological KG information but also leverage their world knowledge for better path identification. We experiment with two popular LLMs, i.e., GPT-3.5 and GPT-4, and find that they are able to do CPI and meanwhile generate reasonable explanations.

MCML Authors

Zifeng Ding

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Database Systems and Data Mining

[954]

V. Blaschke, B. Kovačić, S. Peng, H. Schütze and B. Plank.
MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL

Abstract

Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in `within-language breadth’: most treebanks focus on standard languages. Even for German, the language with the most annotations in UD, so far no treebank exists for one of its language varieties spoken by over 10M people: Bavarian. To contribute to closing this gap, we present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in UD, covering multiple text genres (wiki, fiction, grammar examples, social, non-fiction). We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers’ orthographies. Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries. We provide baseline parsing and POS tagging results, which are lower than results obtained on German and vary substantially between different graph-based parsers. To support further research on Bavarian syntax, we make our dataset, language-specific guidelines and code publicly available.

MCML Authors

Verena Blaschke

AI and Computational Linguistics

Siyao Peng

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

AI and Computational Linguistics

[953]

V. Hangya and A. Fraser.
How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL

Abstract

Due to the broad range of social media platforms, the requirements of abusive language detection systems are varied and ever-changing. Already a large set of annotated corpora with different properties and label sets were created, such as hate or misogyny detection, but the form and targets of abusive speech are constantly evolving. Since, the annotation of new corpora is expensive, in this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection. Our goal is to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain. We propose a two-step approach: first we train our model in a multitask fashion. We then carry out few-shot adaptation to the target requirements. Our experiments show that using already existing datasets and only a few-shots of the target task the performance of models improve both monolingually and across languages. Our analysis also shows that our models acquire a general understanding of abusive language, since they improve the prediction of labels which are present only in the target dataset and can benefit from knowledge about labels which are not directly used for the target task.

MCML Authors

Viktor Hangya

Dr.

* Former Member

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Data Analytics & Statistics

[952]

A. H. Kargaran, F. Yvon and H. Schütze.
GlotScript: A Resource and Tool for Low Resource Writing System Identification.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL GitHub

Abstract

We present GlotScript, an open resource and tool for low resource writing system identification. GlotScript-R is a resource that provides the attested writing systems for more than 7,000 languages. It is compiled by aggregating information from existing writing system resources. GlotScript-T is a writing system identification tool that covers all 161 Unicode 15.0 scripts. For an input text, it returns its script distribution where scripts are identified by ISO 15924 codes. We also present two use cases for GlotScript. First, we demonstrate that GlotScript can help cleaning multilingual corpora such as mC4 and OSCAR. Second, we analyze the tokenization of a number of language models such as GPT-4 using GlotScript and provide insights on the coverage of low resource scripts and languages by each language model. We hope that GlotScript will become a useful resource for work on low resource languages in the NLP community.

MCML Authors

Amir Hossein Kargaran

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[951]

A. Köksal, S. Severini and H. Schütze.
SilverAlign: MT-Based Silver Data Algorithm for Evaluating Word Alignment.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL

Abstract

Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a new method to automatically create silver data for the evaluation of word aligners by exploiting machine translation and minimal pairs. We show that performance on our silver data correlates well with gold benchmarks for 9 language pairs, making our approach a valid resource for evaluation of different domains and languages when gold data are not available. This addresses the important scenario of missing gold data alignments for low-resource languages.

MCML Authors

Abdullatif Köksal

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[950]

M. Marco and A. Fraser.
Analyzing the Understanding of Morphologically Complex Words in Large Language Models.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL

Abstract

We empirically study the ability of a Large Language Model (gpt-3.5-turbo-instruct) to understand morphologically complex words. In our experiments, we looked at a variety of tasks to analyse German compounds with regard to compositional word formation and derivation, such as identifying the head noun of existing and novel compounds, identifying the shared verb stem between two words, or recognizing words constructed with inappropriately used derivation morphemes as invalid. Our results show that the language model is generally capable of solving most tasks, except for the task of identifying ill-formed word forms. While the model demonstrated a good overall understanding of complex words and their word-internal structure, the results also suggest that there is no formal knowledge of derivational rules, but rather an interpretation of the observed word parts to derive the meaning of a word.

MCML Authors

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[949]

D. R. Mortensen, V. Izrailevitch, Y. Xiao, H. Schütze and L. Weissweiler.
Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL

Abstract

Lexical-syntactic flexibility, in the form of conversion (or zero-derivation) is a hallmark of English morphology. In conversion, a word with one part of speech is placed in a non-prototypical context, where it is coerced to behave as if it had a different part of speech. However, while this process affects a large part of the English lexicon, little work has been done to establish the degree to which language models capture this type of generalization. This paper reports the first study on the behavior of large language models with reference to conversion. We design a task for testing lexical-syntactic flexibility—the degree to which models can generalize over words in a construction with a non-prototypical part of speech. This task is situated within a natural language inference paradigm. We test the abilities of five language models—two proprietary models (GPT-3.5 and GPT-4), three open source model (Mistral 7B, Falcon 40B, and Llama 2 70B). We find that GPT-4 performs best on the task, followed by GPT-3.5, but that the open source language models are also able to perform it and that the 7-billion parameter Mistral displays as little difference between its baseline performance on the natural language inference task and the non-prototypical syntactic category task, as the massive GPT-4.

MCML Authors

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Leonie Weissweiler

Dr.

* Former Member

[948]

C. Müller and B. Plank.
IndirectQA: Understanding Indirect Answers to Implicit Polar Questions in French and Spanish.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL

Abstract

Polar questions are common in dialogue and expect exactly one of two answers (yes/no). It is however not uncommon for speakers to bypass these expected choices and answer, for example, ‘Islands are generally by the sea’ to the question: ‘An island? By the sea?’. While such answers are natural in spoken dialogues, conversational systems still struggle to interpret them. Seminal work to interpret indirect answers were made in recent years—but only for English and with strict question formulations. In this work, we present a new corpus for French and Spanish—IndirectQA —where we mine subtitle data for indirect answers to study the labeling task with six different labels, while broadening polar questions to include also implicit polar questions (statements that trigger a yes/no-answer which are not necessarily formulated as a question). We opted for subtitles since they are a readily available source of conversation in various languages, but also come with peculiarities and challenges which we will discuss. Overall, we provide the first results on French and Spanish. They show that the task is challenging: the baseline accuracy scores drop from 61.43 on English to 44.06 for French and Spanish.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[947]

S. Peng, Z. Sun, H. Shan, M. Kolm, V. Blaschke, E. Artemova and B. Plank.
Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL

Abstract

Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects. This paper introduces the first dialectal NER dataset for German, BarNER, with 161K tokens annotated on Bavarian Wikipedia articles (bar-wiki) and tweets (bar-tweet), using a schema adapted from German CoNLL 2006 and GermEval. The Bavarian dialect differs from standard German in lexical distribution, syntactic construction, and entity information. We conduct in-domain, cross-domain, sequential, and joint experiments on two Bavarian and three German corpora and present the first comprehensive NER results on Bavarian. Incorporating knowledge from the larger German NER (sub-)datasets notably improves on bar-wiki and moderately on bar-tweet. Inversely, training first on Bavarian contributes slightly to the seminal German CoNLL 2006 corpus. Moreover, with gold dialect labels on Bavarian tweets, we assess multi-task learning between five NER and two Bavarian-German dialect identification tasks and achieve NER SOTA on bar-wiki. We substantiate the necessity of our low-resource BarNER corpus and the importance of diversity in dialects, genres, and topics in enhancing model performance.

MCML Authors

Siyao Peng

Dr.

AI and Computational Linguistics

Verena Blaschke

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[946]

L. Weissweiler, N. Böbel, K. Guiller, S. Herrera, W. Scivetti, A. Lorenzi, N. Melnik, A. Bhatia, H. Schütze, L. Levin, A. Zeldes, J. Nivre, W. Croft and N. Schneider.
UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL

Abstract

The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages. However, the UD annotations do not tell the full story. Grammatical constructions that convey meaning through a particular combination of several morphosyntactic elements – for example, interrogative sentences with special markers and/or word orders – are not labeled holistically. We argue for (i) augmenting UD annotations with a ‘UCxn’ annotation layer for such meaning-bearing grammatical constructions, and (ii) approaching this in a typologically informed way so that morphosyntactic strategies can be compared across languages. As a case study, we consider five construction families in ten languages, identifying instances of each construction in UD treebanks through the use of morphosyntactic patterns. In addition to findings regarding these particular constructions, our study yields important insights on methodology for describing and identifying constructions in language-general and language-particular ways, and lays the foundation for future constructional enrichment of UD treebanks.

MCML Authors

Leonie Weissweiler

Dr.

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[945]

M. Winkler, V. Juozapaityte, R. van der Goot and B. Plank.
Slot and Intent Detection Resources for Bavarian and Lithuanian: Assessing Translations vs Natural Queries to Digital Assistants.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL

Abstract

Digital assistants perform well in high-resource languages like English, where tasks like slot and intent detection (SID) are well-supported. Many recent SID datasets start including multiple language varieties. However, it is unclear how realistic these translated datasets are. Therefore, we extend one such dataset, namely xSID-0.4, to include two underrepresented languages: Bavarian, a German dialect, and Lithuanian, a Baltic language. Both language variants have limited speaker populations and are often not included in multilingual projects. In addition to translations we provide “natural” queries to digital assistants generated by native speakers. We further include utterances from another dataset for Bavarian to build the richest SID dataset available today for a low-resource dialect without standard orthography. We then set out to evaluate models trained on English in a zero-shot scenario on our target language variants. Our evaluation reveals that translated data can produce overly optimistic scores. However, the error patterns in translated and natural datasets are highly similar. Cross-dataset experiments demonstrate that data collection methods influence performance, with scores lower than those achieved with single-dataset translations. This work contributes to enhancing SID datasets for underrepresented languages, yielding NaLiBaSID, a new evaluation dataset for Bavarian and Lithuanian.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[944]

S. Zhou, L. Weissweiler, T. He, H. Schütze, D. R. Mortensen and L. Levin.
Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons.
LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evalutaion. Torino, Italy, May 20-25, 2024. URL

Abstract

In this paper, we make a contribution that can be understood from two perspectives: from an NLP perspective, we introduce a small challenge dataset for NLI with large lexical overlap, which minimises the possibility of models discerning entailment solely based on token distinctions, and show that GPT-4 and Llama 2 fail it with strong bias. We then create further challenging sub-tasks in an effort to explain this failure. From a Computational Linguistics perspective, we identify a group of constructions with three classes of adjectives which cannot be distinguished by surface features. This enables us to probe for LLM’s understanding of these constructions in various ways, and we find that they fail in a variety of ways to distinguish between them, suggesting that they don’t adequately represent their meaning or capture the lexical properties of phrasal heads.

MCML Authors

Shijia Zhou

AI and Computational Linguistics

Leonie Weissweiler

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[943]

C. A. Scholbeck.
Bridging gaps in interpretable machine learning: sensitivity analysis, marginal effects, and cluster explanations.
Dissertation 2024. DOI

Abstract

This thesis explores interpretable machine learning (IML) through six papers, bridging the gap between IML and model interpretation in other domains. It presents a generalized framework for model-agnostic interpretation methods, highlights potential pitfalls, and connects IML to sensitivity analysis used in fields like environmental modeling. A novel approach, forward marginal effects (FMEs), is introduced to interpret predictive models at multiple levels, supported by the R package fmeffects. The work also extends IML to unsupervised learning by proposing algorithm-agnostic cluster explanation methods, including two new techniques: SMART and IDEA, for analyzing feature contributions to clustering. (Shortened.)

MCML Authors

Christian Alexander Scholbeck

* Former Member

[942]

A. Beer, O. Palotás, A. Maldonado, A. Draganov and I. Assent.
DROPP: Structure-aware PCA for Ordered Data.
ICDE 2024 - 40th IEEE International Conference on Data Engineering. Utrecht, Netherlands, May 13-17, 2024. DOI

Abstract

Ordered data arises in many areas, e.g., in molecular dynamics and other spatial-temporal trajectories. While data points that are close in this order are related, common dimensionality reduction techniques cannot capture this relation or order. Thus, the information is lost in the low-dimensional representations. We introduce DROPP, which incorporates order into dimensionality reduction by adapting a Gaussian kernel function across the ordered covariances between data points. We find underlying principal components that are characteristic of the process that generated the data. In extensive experiments, we show DROPP’s advantages over other dimensionality reduction techniques on synthetic as well as real-world data sets from molecular dynamics and climate research: The principal components of different data sets that were generated by the same underlying mechanism are very similar to each other. They can, thus, be used for dimensionality reduction with low reconstruction errors along a set of data sets, allowing an explainable visual comparison of different data sets as well as good compression even for unseen data.

MCML Authors

Andrea Maldonado

Database Systems and Data Mining

[941]

Y. Velikova, M. F. Azampour, W. Simson, M. Esposito and N. Navab.
Implicit Neural Representations for Breathing-compensated Volume Reconstruction in Robotic Ultrasound Aorta Screening.
ICRA 2024 - IEEE International Conference on Robotics and Automation. Yokohoma, Japan, May 13-17, 2024. DOI

Abstract

Ultrasound (US) imaging is widely used in diagnosing and staging abdominal diseases due to its lack of non-ionizing radiation and prevalent availability. However, significant inter-operator variability and inconsistent image acquisition hinder the widespread adoption of extensive screening programs. Robotic ultrasound systems have emerged as a promising solution, offering standardized acquisition protocols and the possibility of automated acquisition. Additionally, these systems enable access to 3D data via robotic tracking, enhancing volumetric reconstruction for improved ultrasound interpretation and precise disease diagnosis.However, the interpretability of 3D US reconstruction of abdominal images can be affected by the patient’s breathing motion. This study introduces a method to compensate for breathing motion in 3D US compounding by leveraging implicit neural representations. Our approach employs a robotic ultrasound system for automated screenings. To demonstrate the method’s effectiveness, we evaluate our proposed method for the diagnosis and monitoring of abdominal aorta aneurysms as a representative use case.Our experiments demonstrate that our proposed pipeline facilitates robust automated robotic acquisition, mitigating artifacts from breathing motion, and yields smoother 3D reconstructions for enhanced screening and medical diagnosis.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Walter Simson

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[940]

J. W. Grootjen, H. Weingärtner and S. Mayer.
Uncovering and Addressing Blink-Related Challenges in Using Eye Tracking for Interactive Systems.
CHI 2024 - Conference on Human Factors in Computing Systems. Honolulu, Hawaii, May 11-16, 2024. DOI

Abstract

Currently, interactive systems use physiological sensing to enable advanced functionalities. While eye tracking is a promising means to understand the user, eye tracking data inherently suffers from missing data due to blinks, which may result in reduced system performance. We conducted a literature review to understand how researchers deal with this issue. We uncovered that researchers often implemented their use-case-specific pipeline to overcome the issue, ranging from ignoring missing data to artificial interpolation. With these first insights, we run a large-scale analysis on 11 publicly available datasets to understand the impact of the various approaches on data quality and accuracy. By this, we highlight the pitfalls in data processing and which methods work best. Based on our results, we provide guidelines for handling eye tracking data for interactive systems. Further, we propose a standard data processing pipeline that allows researchers and practitioners to pre-process and standardize their data efficiently.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

* Former Member

[939]

L. Haliburton, I. Damen, C. Lallemand, A. Ahtinen, J. Niess and P. W. Woźniak.
Office Wellbeing by Design: Don’t Stand for Anything Less.
CHI 2024 - Conference on Human Factors in Computing Systems. Honolulu, Hawaii, May 11-16, 2024. DOI

Abstract

The modern workplace has been optimized towards increasing productivity, often at the cost of long-term worker wellbeing. This systemic issue has been acknowledged in both research and practice, but has not yet been solved. There is a notable lack of practical methods of incorporating physical activity and other wellbeing practices into productive workplace activities. We see a gap between research endeavors and industry practice that motivates a call for increased collaboration between the two parties. In response, our workshop aims to bring together researchers and practitioners to work together in identifying a set of grand challenges for the field. Through collaboration, we will create a concrete research agenda to create a resilient future workplace that explicitly incorporates holistic worker wellbeing.

MCML Authors

Luke Haliburton

Dr.

* Former Member

[938]

L. Haliburton, D. J. Grüning, F. Riedel, A. Schmidt and N. Terzimehić.
A Longitudinal In-the-Wild Investigation of Design Frictions to Prevent Smartphone Overuse.
CHI 2024 - Conference on Human Factors in Computing Systems. Honolulu, Hawaii, May 11-16, 2024. DOI

Abstract

Smartphone overuse is hyper-prevalent in society, and developing tools to prevent this overuse has become a focus of HCI. However, there is a lack of work investigating smartphone overuse interventions over the long term. We collected usage data from N = 1, 039 users of one sec over an average of 13.4 weeks and qualitative insights from 249 of the users through an online survey. We found that users overwhelmingly choose to target Social Media apps. We found that the short design frictions introduced by one sec effectively reduce how often users attempt to open target apps and lead to more intentional app-openings over time. Additionally, we found that users take periodic breaks from one sec interventions, and quickly rebound from a pattern of overuse when returning from breaks. Overall, we contribute findings from a longitudinal investigation of design frictions in the wild and identify usage patterns from real users in practice.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[937]

S. Sakel, T. Blenk, A. Schmidt and L. Haliburton.
The Social Journal: Investigating Technology to Support and Reflect on Meaningful Social Interactions.
CHI 2024 - Conference on Human Factors in Computing Systems. Honolulu, Hawaii, May 11-16, 2024. DOI

Abstract

Social interaction is a crucial part of what it means to be human. Maintaining a healthy social life is strongly tied to positive outcomes for both physical and mental health. While we use personal informatics data to reflect on many aspects of our lives, technology-supported reflection for social interactions is currently under-explored. To address this, we first conducted an online survey (N=124) to understand how users want to be supported in their social interactions. Based on this, we designed and developed an app for users to track and reflect on their social interactions and deployed it in the wild for two weeks (N=25). Our results show that users are interested in tracking meaningful in-person interactions that are currently untraced and that an app can effectively support self-reflection on social interaction frequency and social load. We contribute insights and concrete design recommendations for technology-supported reflection for social interaction.

MCML Authors

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Luke Haliburton

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[936]

K. Röck.
Stochastic processes as surrogate models for dynamical systems in magnetic confinement fusion.
Dissertation 2024. DOI

Abstract

This thesis focuses on incorporating domain-specific knowledge into machine learning (ML) models for scientific applications, ensuring they accurately reflect underlying physical systems.
The first part introduces physics-consistent Gaussian processes (GPs), embedding physical laws directly into the model. These models address data governed by partial differential equations (PDEs) and Hamiltonian systems, preserving physical properties like symplecticity and enabling faster, long-term simulations. Applications include classifying chaotic trajectories and computing Lyapunov exponents.
The second part tackles data scarcity in plasma physics by proposing robust surrogate models for multivariate time series. Using Student-$t$ process regression, these models handle outliers effectively and facilitate data imputation and augmentation, ensuring reliable predictions for multichannel sensor data.
This work advances ML approaches for surrogate modeling, chaos analysis, and plasma physics. (Shortened.)

MCML Authors

Katharina Röck (née Rath)

Dr.

* Former Member

[935]

R. Kohli, M. Feurer, B. Bischl, K. Eggensperger and F. Hutter.
Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning.
DMLR @ICLR 2024 - Workshop on Data-centric Machine Learning Research at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL

Abstract

Data in tabular form makes up a large part of real-world ML applications, and thus, there has been a strong interest in developing novel deep learning (DL) architectures for supervised learning on tabular data in recent years. As a result, there is a debate as to whether DL methods are superior to the ubiquitous ensembles of boosted decision trees. Typically, the advantage of one model class over the other is claimed based on an empirical evaluation, where different variations of both model classes are compared on a set of benchmark datasets that supposedly resemble relevant real-world tabular data. While the landscape of state-of-the-art models for tabular data changed, one factor has remained largely constant over the years: The datasets. Here, we examine 30 recent publications and 187 different datasets they use, in terms of age, study size and relevance. We found that the average study used less than 10 datasets and that half of the datasets are older than 20 years. Our insights raise questions about the conclusions drawn from previous studies and urge the research community to develop and publish additional recent, challenging and relevant datasets and ML tasks for supervised learning on tabular data.

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[934]

S. d'Ascoli, S. Becker, P. Schwaller, A. Mathis and N. Kilbertus.
ODEFormer: Symbolic Regression of Dynamical Systems with Transformers.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL GitHub

Abstract

We introduce ODEFormer, the first transformer able to infer multidimensional ordinary differential equation (ODE) systems in symbolic form from the observation of a single solution trajectory. We perform extensive evaluations on two datasets: (i) the existing ‘Strogatz’ dataset featuring two-dimensional systems; (ii) ODEBench, a collection of one- to four-dimensional systems that we carefully curated from the literature to provide a more holistic benchmark. ODEFormer consistently outperforms existing methods while displaying substantially improved robustness to noisy and irregularly sampled observations, as well as faster inference.

MCML Authors

Sören Becker

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[933]

L. Eyring, D. Klein, T. Uscidda, G. Palla, N. Kilbertus, Z. Akata and F. J. Theis.
Unbalancedness in Neural Monge Maps Improves Unpaired Domain Translation.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

In optimal transport (OT), a Monge map is known as a mapping that transports a source distribution to a target distribution in the most cost-efficient way. Recently, multiple neural estimators for Monge maps have been developed and applied in diverse unpaired domain translation tasks, e.g. in single-cell biology and computer vision. However, the classic OT framework enforces mass conservation, which makes it prone to outliers and limits its applicability in real-world scenarios. The latter can be particularly harmful in OT domain translation tasks, where the relative position of a sample within a distribution is explicitly taken into account. While unbalanced OT tackles this challenge in the discrete setting, its integration into neural Monge map estimators has received limited attention. We propose a theoretically grounded method to incorporate unbalancedness into any Monge map estimator. We improve existing estimators to model cell trajectories over time and to predict cellular responses to perturbations. Moreover, our approach seamlessly integrates with the OT flow matching (OT-FM) framework. While we show that OT-FM performs competitively in image translation, we further improve performance by incorporating unbalancedness (UOT-FM), which better preserves relevant features. We hence establish UOT-FM as a principled method for unpaired image translation.

MCML Authors

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Fabian Theis

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Mathematical Modelling of Biological Systems

[932]

D. Frauen, F. Imrie, A. Curth, V. Melnychuk, S. Feuerriegel and M. van der Schaar.
A Neural Framework for Generalized Causal Sensitivity Analysis.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

Unobserved confounding is common in many applications, making causal inference from observational data challenging. As a remedy, causal sensitivity analysis is an important tool to draw causal conclusions under unobserved confounding with mathematical guarantees. In this paper, we propose NeuralCSA, a neural framework for generalized causal sensitivity analysis. Unlike previous work, our framework is compatible with (i) a large class of sensitivity models, including the marginal sensitivity model, -sensitivity models, and Rosenbaum’s sensitivity model; (ii) different treatment types (i.e., binary and continuous); and (iii) different causal queries, including (conditional) average treatment effects and simultaneous effects on multiple outcomes. This generality is achieved by learning a latent distribution shift that corresponds to a treatment intervention using two conditional normalizing flows. We provide theoretical guarantees that NeuralCSA is able to infer valid bounds on the causal query of interest and also demonstrate this empirically using both simulated and real-world data.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[931]

K. Heß, V. Melnychuk, D. Frauen and S. Feuerriegel.
Bayesian Neural Controlled Differential Equations for Treatment Effect Estimation.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

Treatment effect estimation in continuous time is crucial for personalized medicine. However, existing methods for this task are limited to point estimates of the potential outcomes, whereas uncertainty estimates have been ignored. Needless to say, uncertainty quantification is crucial for reliable decision-making in medical applications. To fill this gap, we propose a novel Bayesian neural controlled differential equation (BNCDE) for treatment effect estimation in continuous time. In our BNCDE, the time dimension is modeled through a coupled system of neural controlled differential equations and neural stochastic differential equations, where the neural stochastic differential equations allow for tractable variational Bayesian inference. Thereby, for an assigned sequence of treatments, our BNCDE provides meaningful posterior predictive distributions of the potential outcomes. To the best of our knowledge, ours is the first tailored neural method to provide uncertainty estimates of treatment effects in continuous time. As such, our method is of direct practical value for promoting reliable decision-making in medicine.

MCML Authors

Konstantin Heß

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[930]

C. Koke and D. Cremers.
HoloNets: Spectral Convolutions do extend to Directed Graphs.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

Within the graph learning community, conventional wisdom dictates that spectral convolutional networks may only be deployed on undirected graphs: Only there could the existence of a well-defined graph Fourier transform be guaranteed, so that information may be translated between spatial- and spectral domains. Here we show this traditional reliance on the graph Fourier transform to be superfluous and – making use of certain advanced tools from complex analysis and spectral theory – extend spectral convolutions to directed graphs. We provide a frequency-response interpretation of newly developed filters, investigate the influence of the basis used to express filters and discuss the interplay with characteristic operators on which networks are based. In order to thoroughly test the developed theory, we conduct experiments in real world settings, showcasing that directed spectral convolutional networks provide new state of the art results for heterophilic node classification on many datasets and – as opposed to baselines – may be rendered stable to resolution-scale varying topological perturbations.

MCML Authors

Christian Koke

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computer Vision & Artificial Intelligence

[929]

V. Melnychuk, D. Frauen and S. Feuerriegel.
Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning. Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation. However, low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated. In this paper, we propose a new, representation-agnostic refutation framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. First, we establish theoretically under which conditions CATE is non-identifiable given low-dimensional (constrained) representations. Second, as our remedy, we propose a neural refutation framework which performs partial identification of CATE or, equivalently, aims at estimating lower and upper bounds of the representation-induced confounding bias. We demonstrate the effectiveness of our bounds in a series of experiments. In sum, our refutation framework is of direct relevance in practice where the validity of CATE estimation is of importance.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[928]

M. Schröder, D. Frauen and S. Feuerriegel.
Causal Fairness under Unobserved Confounding: A Neural Sensitivity Framework.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

Fairness of machine learning predictions is widely required in practice for legal, ethical, and societal reasons. Existing work typically focuses on settings without unobserved confounding, even though unobserved confounding can lead to severe violations of causal fairness and, thus, unfair predictions. In this work, we analyze the sensitivity of causal fairness to unobserved confounding. Our contributions are three-fold. First, we derive bounds for causal fairness metrics under different sources of unobserved confounding. This enables practitioners to examine the sensitivity of their machine learning models to unobserved confounding in fairness-critical applications. Second, we propose a novel neural framework for learning fair predictions, which allows us to offer worst-case guarantees of the extent to which causal fairness can be violated due to unobserved confounding. Third, we demonstrate the effectiveness of our framework in a series of experiments, including a real-world case study about predicting prison sentences. To the best of our knowledge, ours is the first work to study causal fairness under unobserved confounding. To this end, our work is of direct practical value as a refutation strategy to ensure the fairness of predictions in high-stakes applications.

MCML Authors

Maresa Schröder

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[927]

S. Solonets, D. Sinitsyn, L. Von Stumberg, N. Araslanov and D. Cremers.
An Analytical Solution to Gauss-Newton Loss for Direct Image Alignment.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

Direct image alignment is a widely used technique for relative 6DoF pose estimation between two images, but its accuracy strongly depends on pose initialization. Therefore, recent end-to-end frameworks increase the convergence basin of the learned feature descriptors with special training objectives, such as the Gauss-Newton loss. However, the training data may exhibit bias toward a specific type of motion and pose initialization, thus limiting the generalization of these methods. In this work, we derive a closed-form solution to the expected optimum of the Gauss-Newton loss. The solution is agnostic to the underlying feature representation and allows us to dynamically adjust the basin of convergence according to our assumptions about the uncertainty in the current estimates. These properties allow for effective control over the convergence in the alignment process. Despite using self-supervised feature embeddings, our solution achieves compelling accuracy w.r.t. the state-of-the-art direct image alignment methods trained end-to-end with pose supervision, and demonstrates improved robustness to pose initialization. Our analytical solution exposes some inherent limitations of end-to-end learning with the Gauss-Newton loss, and establishes an intriguing connection between direct image alignment and feature-matching approaches.

MCML Authors

Sergei Solonets

Computer Vision & Artificial Intelligence

Daniil Sinitsyn

Computer Vision & Artificial Intelligence

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[926]

A. Vahidi, S. Schosser, L. Wimmer, Y. Li, B. Bischl, E. Hüllermeier and M. Rezaei.
Probabilistic Self-supervised Representation Learning via Scoring Rules Minimization.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL GitHub

Abstract

In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN’s convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method’s optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Yawei Li

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Mina Rezaei

Dr.

Statistical Learning and Data Science

[925]

S. Chen, Z. Han, B. He, M. Buckley, P. Torr, V. Tresp and J. Gu.
Understanding and Improving In-Context Learning on Vision-language Models.
ME-FoMo @ICLR 2024 - Workshop on Mathematical and Empirical Understanding of Foundation Models at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL

Abstract

Recently, in-context learning (ICL) on large language models (LLMs) has received great attention, and this technique can also be applied to vision-language models (VLMs) built upon LLMs. These VLMs can respond to queries by conditioning responses on a series of multimodal demonstrations, which comprise images, queries, and answers. Though ICL has been extensively studied on LLMs, its research on VLMs remains limited. The inclusion of additional visual information in the demonstrations motivates the following research questions: which of the two modalities in the demonstration is more significant? How can we select effective multimodal demonstrations to enhance ICL performance? This study investigates the significance of both visual and language information. Our findings indicate that ICL in VLMs is predominantly driven by the textual information in the demonstrations whereas the visual information in the demonstrations barely affects the ICL performance. Subsequently, we provide an understanding of the findings by analyzing the model information flow and comparing model inner states given different ICL settings. Motivated by our analysis, we propose a simple yet effective approach, termed Mixed Modality In-Context Example Selection (MMICES), which considers both visual and language modalities when selecting demonstrations and shows better ICL performance. Extensive experiments are conducted to support our findings, understanding, and improvement of the ICL performance of VLMs.

MCML Authors

Shuo Chen

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Database Systems and Data Mining

[924]

C. Liu, C. Albrecht, Y. Wang and X. Zhu.
CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation.
ML4RS @ICLR 2024 - 2nd Workshop Machine Learning for Remote Sensing at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. PDF

Abstract

We study the potential of noisy labels y to pretrain semantic segmentation models in a multi-modal learning framework for geospatial applications. Specifically, we propose a novel Cross-modal Sample Selection method (CromSS) that utilizes the class distributions P^{(d)}(x,c) over pixels x and classes c modelled by multiple sensors/modalities d of a given geospatial scene. Consistency of predictions across sensors d is jointly informed by the entropy of P^{(d)}(x,c). Noisy label sampling we determine by the confidence of each sensor d in the noisy class label, P^{(d)}(x,c=y(x)). To verify the performance of our approach, we conduct experiments with Sentinel-1 (radar) and Sentinel-2 (optical) satellite imagery from the globally-sampled SSL4EO-S12 dataset. We pair those scenes with 9-class noisy labels sourced from the Google Dynamic World project for pretraining. Transfer learning evaluations (downstream task) on the DFC2020 dataset confirm the effectiveness of the proposed method for remote sensing image segmentation.

MCML Authors

Chenying Liu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[923]

S. Zhao, I. Prapas, I. Karasante, Z. Xiong, I. Papoutsis, G. Camps-Valls and X. Zhu.
Causal Graph Neural Networks for Wildfire Danger Prediction.
ML4RS @ICLR 2024 - 2nd Workshop Machine Learning for Remote Sensing at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. PDF

Abstract

Wildfire forecasting is notoriously hard due to the complex interplay of different factors such as weather conditions, vegetation types and human activities. Deep learning models show promise in dealing with this complexity by learning directly from data. However, to inform critical decision making, we argue that we need models that are right for the right reasons; that is, the implicit rules learned should be grounded by the underlying processes driving wildfires. In that direction, we propose integrating causality with Graph Neural Networks (GNNs) that explicitly model the causal mechanism among complex variables via graph learning. The causal adjacency matrix considers the synergistic effect among variables and removes the spurious links from highly correlated impacts. Our methodology’s effectiveness is demonstrated through superior performance forecasting wildfire patterns in the European boreal and mediterranean biome. The gain is especially prominent in a highly imbalanced dataset, showcasing an enhanced robustness of the model to adapt to regime shifts in functional relationships. Furthermore, SHAP values from our trained model further enhance our understanding of the model’s inner workings.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[922]

Z. Li, S. S. Cranganore, N. Youngblut and N. Kilbertus.
Whole Genome Transformers for Gene Interaction Effects in Microbiome Habitat Prediction.
MLGenX @ICLR 2024 - Workshop Machine Learning for Genomics Explorations at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL

Abstract

Leveraging the vast genetic diversity within microbiomes offers unparalleled insights into complex phenotypes, yet the task of accurately predicting and understanding such traits from genomic data remains challenging. We propose a framework that leverages existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences. Based on our model, we develop attribution techniques to elucidate gene interaction effects that drive microbial adaptation to diverse environments. We train and validate our approach on a large dataset of high-quality microbiome genomes from different habitats. We not only demonstrate solid predictive performance but also pioneer leveraging sequence-level information of entire genomes to reveal the genetic foundations of complex phenotypes. Our attribution recovers known important interaction networks and proposes new candidates for experimental follow-up.

MCML Authors

Zhufeng Li

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[921]

L. Zellner, S. Rauch, J. Sontheim and T. Seidl.
On Diverse and Precise Recommendations for Small and Medium-Sized Enterprises.
PAKDD 2024 - 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Taipeh, Taiwan, May 07-10, 2024. DOI GitHub

Abstract

Recommender Systems are a popular and common means to extract relevant information for users. Small and medium-sized enterprises make up a large share of the overall amount of business but need to be more frequently considered regarding the demand for recommender systems. Different conditions, such as the small amount of data, lower computational capabilities, and users frequently not possessing an account, require a different and potentially a more small-scale recommender system. The requirements regarding quality are similar: High accuracy and high diversity are certainly an advantage. We provide multiple solutions with different variants solely based on information contained in event-based sequences and temporal information. Our code is available at GitHub. We conduct experiments on four different datasets with an increasing set of items to show a possible range for scalability. The promising results show the applicability of these grammar-based recommender system variants and leave the final decision on which recommender to choose to the user and its ultimate goals.

MCML Authors

Simon Rauch

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[920]

S. Chen, Z. Han, B. He, Z. Ding, W. Yu, P. Torr, V. Tresp and J. Gu.
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
SeT LLM @ICLR 2024 - Workshop on Secure and Trustworthy Large Language Models at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL

Abstract

Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproduction and fair comparison. Besides, there is a lack of comprehensive evaluation of closed-source state-of-the-art (SOTA) models, especially MLLMs, such as GPT-4V. To address these issues, this work first builds a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. Based on this dataset, extensive red-teaming experiments are conducted on 11 different LLMs and MLLMs, including both SOTA proprietary models and open-source models. We then conduct a deep analysis of the evaluated results and find that (1) GPT4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source LLMs and MLLMs. (2) Llama2 and Qwen-VL-Chat are more robust compared to other open-source models. (3) The transferability of visual jailbreak methods is relatively limited compared to textual jailbreak methods.

MCML Authors

Shuo Chen

Database Systems and Data Mining

Zifeng Ding

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[919]

H. N. Dang, V. Golkov, J. Endres, S. Weinmüller, F. Glang, T. Wimmer, D. Cremers, A. Dörfler, A. Maier and M. Zaiss.
Joint sequence optimization beats pure neural network approaches for super-resolution TSE.
ISMRM 2024 - International Society for Magnetic Resonance in Medicine Annual Meeting. Singapore, May 04-09, 2024. URL

Abstract

Current MRI super-resolution (SR) methods only use existing contrasts acquired from typical clinical sequences as input for the neural network (NN). In turbo spin echo sequences (TSE) the sequence parameters can have a strong influence on the actual resolution of the acquired image and have consequently a considera-ble impact on the performance of the NN. We propose a known-operator learning approach to perform an end-to-end optimization of MR sequence and neural net-work parameters for SR-TSE. This MR-physics-informed training procedure jointly optimizes the radiofrequency pulse train of a proton density- (PD-) and T2-weighted TSE and a subsequently applied convolutional neural network to predict the corresponding PDw and T2w super-resolution TSE images. The found radiofrequency pulse train designs generate an optimal signal for the NN to perform the SR task. Our method generalizes from the simulation-based optimi-zation to in vivo measurements and the acquired physics-informed SR images show higher correlation with a time-consuming segmented high-resolution TSE sequence compared to a pure network training approach.

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[918]

R. Klaar, M. Rabe, A. T. Stüber, S. Corradini, C. Eze, C. Belka, G. Landry, C. Kurz and J. Dinkel.
Using Ventilation and Perfusion MRI at a 0.35 T MR-Linac to Predict Radiation-Induced Pneumonitis in Lung Cancer Patients.
ISMRM 2024 - International Society for Magnetic Resonance in Medicine Annual Meeting. Singapore, May 04-09, 2024. URL

Abstract

Motivation: Early predictors of radiation-induced pneumonitis in patients receiving MR-guided radiotherapy allowing a closer follow up and taking early countermeasures to avoid a severe disease progression have not yet been identified.
Goal(s): We aimed at finding functional MR-based biomarkers acquired during treatment that allows the prediction of radiation-induced pneumonitis (RP) for lung cancer patients directly after MR-guided radiotherapy.
Approach: For 19 patients, ventilation- and perfusion-maps were acquired using a non-contrast enhanced free-breathing technique and investigated in different regions of the irradiated lung.
Results: Changes over treatment in the ventilation around the tumor significantly separate between RP and non-RP group.
Impact: The acquisition of additional functional lung imaging during MR-guided radiotherapy requires little effort while offering the opportunity to identify lung cancer patients at risk of developing radiation-induced pneumonitis right after treatment and to take early countermeasures to avoid severe complications.

MCML Authors

Theresa Stüber

Clinical Data Science in Radiology

[917]

Y. Zhang, N. Stolt-Ansó, J. Pan, W. Huang, K. Hammernik and D. Rückert.
Reconstruction-free segmentation from undersampled k-space using transformers.
ISMRM 2024 - International Society for Magnetic Resonance in Medicine Annual Meeting. Singapore, May 04-09, 2024. URL

Abstract

Motivation: High acceleration factors place a limit on MRI image reconstruction. This limit is extended to segmentation models when treating these as subsequent independent processes.
Goal(s): Our goal is to produce segmentations directly from sparse k-space measurements without the need for intermediate image reconstruction.
Approach: We employ a transformer architecture to encode global k-space information into latent features. The produced latent vectors condition queried coordinates during decoding to generate segmentation class probabilities.
Results: The model is able to produce better segmentations across high acceleration factors than image-based segmentation baselines.
Impact: Cardiac segmentation directly from undersampled k-space samples circumvents the need for an intermediate image reconstruction step. This allows the potential to assess myocardial structure and function on higher acceleration factors than methods that rely on images as input.

MCML Authors

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[916]

J. Kiechle, S. C. Foreman, S. Fischer, D. Rusche, V. Rösner, A.-K. Lohse, C. Mogler, S. E. Combs, M. R. Makowski, K. Woertler, D. M. Lang, J. A. Schnabel, A. S. Gersing and J. C. Peeken.
Investigating the role of morphology in deep learning-based liposarcoma grading.
ESTRO 2024 - Annual Meeting of the European Society for Radiotherapy and Oncology. Glasgow, UK, May 03-07, 2024. URL

MCML Authors

Johannes Kiechle

Computational Imaging and AI in Medicine

Stefan Fischer

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[915]

V. Bengs, B. Haddenhorst and E. Hüllermeier.
Identifying Copeland Winners in Dueling Bandits with Indifferences.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[914]

D. Dold, D. Rügamer, B. Sick and O. Dürr.
Bayesian Semi-structured Subspace Inference.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

Semi-structured regression models enable the joint modeling of interpretable structured and complex unstructured feature effects. The structured model part is inspired by statistical models and can be used to infer the input-output relationship for features of particular importance. The complex unstructured part defines an arbitrary deep neural network and thereby provides enough flexibility to achieve competitive prediction performance. While these models can also account for aleatoric uncertainty, there is still a lack of work on accounting for epistemic uncertainty. In this paper, we address this problem by presenting a Bayesian approximation for semi-structured regression models using subspace inference. To this end, we extend subspace inference for joint posterior sampling from a full parameter space for structured effects and a subspace for unstructured effects. Apart from this hybrid sampling scheme, our method allows for tunable complexity of the subspace and can capture multiple minima in the loss landscape. Numerical experiments validate our approach’s efficacy in recovering structured effect parameter posteriors in semi-structured models and approaching the full-space posterior distribution of MCMC for increasing subspace dimension. Further, our approach exhibits competitive predictive performance across simulated and real-world datasets.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[913]

P. Kolpaczki, M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
SVARM-IQ: Efficient Approximation of Any-order Shapley Interactions through Stratification.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

Addressing the limitations of individual attribution scores via the Shapley value (SV), the field of explainable AI (XAI) has recently explored intricate interactions of features or data points. In particular, extensions of the SV, such as the Shapley Interaction Index (SII), have been proposed as a measure to still benefit from the axiomatic basis of the SV. However, similar to the SV, their exact computation remains computationally prohibitive. Hence, we propose with SVARM-IQ a sampling-based approach to efficiently approximate Shapley-based interaction indices of any order. SVARM-IQ can be applied to a broad class of interaction indices, including the SII, by leveraging a novel stratified representation. We provide non-asymptotic theoretical guarantees on its approximation quality and empirically demonstrate that SVARM-IQ achieves state-of-the-art estimation results in practical XAI scenarios on different model classes and application domains.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Artificial Intelligence and Machine Learning

[912]

N. Palm and T. Nagler.
An Online Bootstrap for Time Series.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series or spatially correlated observations. In this paper, we propose a novel bootstrap method that is designed to account for data dependencies and can be executed online, making it particularly suitable for real-time applications. This method is based on an autoregressive sequence of increasingly dependent resampling weights. We prove the theoretical validity of the proposed bootstrap scheme under general conditions. We demonstrate the effectiveness of our approach through extensive simulations and show that it provides reliable uncertainty quantification even in the presence of complex data dependencies. Our work bridges the gap between classical resampling techniques and the demands of modern data analysis, providing a valuable tool for researchers and practitioners in dynamic, data-rich environments.

MCML Authors

Nicolai Palm

Computational Statistics & Data Science

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[911]

D. Rügamer.
Scalable Higher-Order Tensor Product Spline Models.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

In the current era of vast data and transparent machine learning, it is essential for techniques to operate at a large scale while providing a clear mathematical comprehension of the internal workings of the method. Although there already exist interpretable semi-parametric regression methods for large-scale applications that take into account non-linearity in the data, the complexity of the models is still often limited. One of the main challenges is the absence of interactions in these models, which are left out for the sake of better interpretability but also due to impractical computational costs. To overcome this limitation, we propose a new approach using a factorization method to derive a highly scalable higher-order tensor product spline model. Our method allows for the incorporation of all (higher-order) interactions of non-linear feature effects while having computational costs proportional to a model without interactions. We further develop a meaningful penalization scheme and examine the induced optimization problem. We conclude by evaluating the predictive and estimation performance of our method.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[910]

Z. Ye, G. Peyré, D. Cremers and P. Ablin.
Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL GitHub

Abstract

Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem. It is routinely used in Machine Learning, notably for hyperparameter tuning. The conventional method to compute the so-called hypergradient of the outer problem is to use the Implicit Function Theorem (IFT). As a function of the error of the inner problem resolution, we study the error of the IFT method. We analyze two strategies to reduce this error: preconditioning the IFT formula and reparameterizing the inner problem. We give a detailed account of the impact of these two modifications on the error, highlighting the role played by higher-order derivatives of the functionals at stake. Our theoretical findings explain when super efficiency, namely reaching an error on the hypergradient that depends quadratically on the error on the inner problem, is achievable and compare the two approaches when this is impossible. Numerical evaluations on hyperparameter tuning for regression problems substantiate our theoretical findings.

MCML Authors

Zhenzhang Ye

* Former Member

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[909]

A. Solderer, S. P. Hicklin, M. Aßenmacher, A. Ender and P. R. Schmidlin.
Influence of an allogenic collagen scaffold on implant sites with thin supracrestal tissue height: a randomized clinical trial.
Clinical Oral Investigations 28.313 (May. 2024). DOI

Abstract

Objectives: This randomized clinical trial focused on patients with thin peri-implant soft-tissue height (STH) (≤ 2.5 mm) and investigated the impact of an allogenic collagen scaffold (aCS) on supracrestal tissue height and marginal bone loss (MBL).
Material & methods: Forty patients received bone level implants and were randomly assigned to the test group with simultaneous tissue thickening with aCS or the control group. After three months, prosthetic restoration occurred. STH measurements were taken at baseline (T0) and reopening surgery (TR), with MBL assessed at 12 months (T1). Descriptive statistics were calculated for continuous variables, and counts for categorical variables (significance level, p = 0.05).
Results: At T1, 37 patients were available. At T0, control and test groups had mean STH values of 2.3 ± 0.3 mm and 2.1 ± 0.4 mm. TR revealed mean STH values of 2.3 ± 0.2 mm (control) and 2.6 ± 0.7 mm (test), with a significant tissue thickening of 0.5 ± 0.6 mm in the test group (p < 0.03). At T1, control and test groups showed MBL mean values of 1.1 ± 0.8 mm and 1.0 ± 0.6 mm, with a moderate but significant correlation with STH thickening (-0.34), implant position (0.43), history of periodontitis (0.39), and smoking status (0.27).
Conclusion: The use of an aCS protocol resulted in soft tissue thickening but did not reach a threshold to reliably reduce MBL compared to the control group within the study’s limitations.
Clinical relevance: Peri-implant STH is crucial for maintaining peri-implant marginal bone stability. Marginal bone stability represents a crucial factor in prevention of peri-implantitis development.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[908]

A. Kazemi, A. Rasouli-Saravani, M. Gharib, T. Albuquerque, S. Eslami and P. J. Schüffler.
A systematic review of machine learning-based tumor-infiltrating lymphocytes analysis in colorectal cancer: Overview of techniques, performance metrics, and clinical outcomes.
Computers in Biology and Medicine 173 (May. 2024). DOI

Abstract

The incidence of colorectal cancer (CRC), one of the deadliest cancers around the world, is increasing. Tissue microenvironment (TME) features such as tumor-infiltrating lymphocytes (TILs) can have a crucial impact on diagnosis or decision-making for treating patients with CRC. While clinical studies showed that TILs improve the host immune response, leading to a better prognosis, inter-observer agreement for quantifying TILs is not perfect. Incorporating machine learning (ML) based applications in clinical routine may promote diagnosis reliability. Recently, ML has shown potential for making progress in routine clinical procedures. We aim to systematically review the TILs analysis based on ML in CRC histological images. Deep learning (DL) and non-DL techniques can aid pathologists in identifying TILs, and automated TILs are associated with patient outcomes. However, a large multi-institutional CRC dataset with a diverse and multi-ethnic population is necessary to generalize ML methods.

MCML Authors

Azar Kazemi

Computational Pathology

Peter Schüffler

Prof. Dr.

Computational Pathology

[907]

K. Jeblick, B. Schachtner, J. Dexl, A. Mittermeier, A. T. Stüber, J. Topalis, T. Weber, P. Wesp, B. O. Sabel, J. Ricke and M. Ingrisch.
ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports.
European Radiology 34 (May. 2024). DOI

Abstract

Objectives: To assess the quality of simplified radiology reports generated with the large language model (LLM) ChatGPT and to discuss challenges and chances of ChatGPT-like LLMs for medical text simplification.
Methods: In this exploratory case study, a radiologist created three fictitious radiology reports which we simplified by prompting ChatGPT with ‘Explain this medical report to a child using simple language.’’ In a questionnaire, we tasked 15 radiologists to rate the quality of the simplified radiology reports with respect to their factual correctness, completeness, and potential harm for patients. We used Likert scale analysis and inductive free-text categorization to assess the quality of the simplified reports.
Results: Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed relevant medical information, and potentially harmful passages were reported.
Conclusion: While we see a need for further adaption to the medical field, the initial insights of this study indicate a tremendous potential in using LLMs like ChatGPT to improve patient-centered care in radiology and other medical domains.
Clinical relevance statement: Patients have started to use ChatGPT to simplify and explain their medical reports, which is expected to affect patient-doctor interaction. This phenomenon raises several opportunities and challenges for clinical routine.

MCML Authors

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Jakob Dexl

Clinical Data Science in Radiology

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Theresa Stüber

Clinical Data Science in Radiology

Johanna Topalis

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Tobias Weber

* Former Member

Philipp Wesp

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[906]

W. Qiu, C. Quan, L. Zhu, Y. Yu, Z. Wang, Y. Ma, M. Sun, Y. Chang, K. Qian, B. Hu, Y. Yamamoto and B. W. Schuller.
Heart Sound Abnormality Detection From Multi-Institutional Collaboration: Introducing a Federated Learning Framework.
IEEE Transactions on Biomedical Engineering 71.10 (May. 2024). DOI

Abstract

Objective: Early diagnosis of cardiovascular diseases is a crucial task in medical practice. With the application of computer audition in the healthcare field, artificial intelligence (AI) has been applied to clinical non-invasive intelligent auscultation of heart sounds to provide rapid and effective pre-screening. However, AI models generally require large amounts of data which may cause privacy issues. Unfortunately, it is difficult to collect large amounts of healthcare data from a single centre. Methods: In this study, we propose federated learning (FL) optimisation strategies for the practical application in multi-centre institutional heart sound databases. The horizontal FL is mainly employed to tackle the privacy problem by aligning the feature spaces of FL participating institutions without information leakage. In addition, techniques based on deep learning have poor interpretability due to their “black-box” property, which limits the feasibility of AI in real medical data. To this end, vertical FL is utilised to address the issues of model interpretability and data scarcity. Conclusion: Experimental results demonstrate that, the proposed FL framework can achieve good performance for heart sound abnormality detection by taking the personal privacy protection into account. Moreover, using the federated feature space is beneficial to balance the interpretability of the vertical FL and the privacy of the data. Significance: This work realises the potential of FL from research to clinical practice, and is expected to have extensive application in the federated smart medical system.

MCML Authors

Björn Schuller

Prof. Dr.

B3 | Multimodal Perception
→ Group Matthias Althoff

Health Informatics

[905]

H. Krasowski and M. Althoff.
Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea.
IEEE Transactions on Intelligent Vehicles Early Access (May. 2024). DOI

Abstract

For safe operation, autonomous vehicles have to obey traffic rules that are set forth in legal documents formulated in natural language. Temporal logic is a suitable concept to formalize such traffic rules. Still, temporal logic rules often result in constraints that are hard to solve using optimization-based motion planners. Reinforcement learning (RL) is a promising method to find motion plans for autonomous vehicles. However, vanilla RL algorithms are based on random exploration and do not automatically comply with traffic rules. Our approach accomplishes guaranteed rule-compliance by integrating temporal logic specifications into RL. Specifically, we consider the application of vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). To efficiently synthesize rule-compliant actions, we combine predicates based on set-based prediction with a statechart representing our formalized rules and their priorities. Action masking then restricts the RL agent to this set of verified rule-compliant actions. In numerical evaluations on critical maritime traffic situations, our agent always complies with the formalized legal rules and never collides while achieving a high goal-reaching rate during training and deployment. In contrast, vanilla and traffic rule-informed RL agents frequently violate traffic rules and collide even after training.

MCML Authors

Hanna Krasowski

Dr.

* Former Member

Matthias Althoff

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Cyber Physical Systems

[904]

M. Brunner, M. Innerberger, A. Miraçi, D. Praetorius, J. Streitberger and P. Heid.
Adaptive FEM with quasi-optimal overall cost for nonsymmetric linear elliptic PDEs.
IMA Journal of Numerical Analysis 44.3 (May. 2024). DOI

Abstract

We consider a general nonsymmetric second-order linear elliptic partial differential equation in the framework of the Lax–Milgram lemma. We formulate and analyze an adaptive finite element algorithm with arbitrary polynomial degree that steers the adaptive meshrefinement and the inexact iterative solution of the arising linear systems. More precisely, the iterative solver employs, as an outer loop, the so-called Zarantonello iteration to symmetrize the system and, as an inner loop, a uniformly contractive algebraic solver, for example, an optimally preconditioned conjugate gradient method or an optimal geometric multigrid algorithm. We prove that the proposed inexact adaptive iteratively symmetrized finite element method leads to full linear convergence and, for sufficiently small adaptivity parameters, to optimal convergence rates with respect to the overall computational cost, i.e., the total computational time. Numerical experiments underline the theory.

MCML Authors

Pascal Heid

Dr.

* Former Member

[903]

V. G. Duque, A. Marquardt, Y. Velikova, L. Lacourpaille, A. Nordez, M. Crouzier, H. J. Lee, D. Mateus and N. Navab.
Ultrasound segmentation analysis via distinct and completed anatomical bordersd.
International Journal of Computer Assisted Radiology and Surgery 19 (May. 2024). DOI

Abstract

Segmenting ultrasound images is important for precise area and/or volume calculations, ensuring reliable diagnosis and effective treatment evaluation for diseases. Recently, many segmentation methods have been proposed and shown impressive performance. However, currently, there is no deeper understanding of how networks segment target regions or how they define the boundaries. In this paper, we present a new approach that analyzes ultrasound segmentation networks in terms of learned borders because border delimitation is challenging in ultrasound.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Hong Joo Lee

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[902]

M.-A. Gafencu, Y. Velikova, M. Saleh, T. Ungi, N. Navab, T. Wendler and M. F. Azampour.
Shape completion in the dark: completing vertebrae morphology from 3D ultrasound.
International Journal of Computer Assisted Radiology and Surgery 19 (May. 2024). DOI

Abstract

Ultrasound (US) imaging, while advantageous for its radiation-free nature, is challenging to interpret due to only partially visible organs and a lack of complete 3D information. While performing US-based diagnosis or investigation, medical professionals therefore create a mental map of the 3D anatomy. In this work, we aim to replicate this process and enhance the visual representation of anatomical structures.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

[901]

A. Lohrer, D. Kazempour, M. Hünemörder and P. Kröger.
CoMadOut—a robust outlier detection algorithm based on CoMAD.
Machine Learning 113 (May. 2024). DOI

Abstract

Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.

MCML Authors

Andreas Lohrer

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[900]

A. F. Thielmann, A. Reuter, T. Kneib, D. Rügamer and B. Säfken.
Interpretable Additive Tabular Transformer Networks.
Transactions on Machine Learning Research (May. 2024). URL

Abstract

Attention based Transformer networks have not only revolutionized Natural Language Processing but have also achieved state-of-the-art results for tabular data modeling. The attention mechanism, in particular, has proven to be highly effective in accurately modeling categorical variables. Although deep learning models recently outperform tree-based models, they often lack a complete comprehension of the individual impact of features because of their opaque nature. In contrast, additive neural network structures have proven to be both predictive and interpretable. Within the context of explainable deep learning, we propose Neural Additive Tabular Transformer Networks (NATT), a modeling framework that combines the intelligibility of additive neural networks with the predictive power of Transformer models. NATT offers inherent intelligibility while achieving similar performance to complex deep learning models. To validate its efficacy, we conduct experiments on multiple datasets and find that NATT performs on par with state-of-the-art methods on tabular data and surpasses other interpretable approaches.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[899]

R. Debelak, T. Koch, M. Aßenmacher and C. Stachl.
From Embeddings to Explainability: A Tutorial on Transformer-Based Text Analysis for Social and Behavioral Scientists.
Preprint (May. 2024). DOI

Abstract

Large language models and their use for text analysis have had a significant impact on psychology and the social and behavioral sciences in general. Key applications include the analysis of texts, such as social media posts, to infer psychological characteristics, as well as survey and interview analysis. In this tutorial paper, we demonstrate the use of the Python-based natural language processing software package transformers (and related modules from the Hugging Face Ecosystem) that allow for the automated classification of text inputs in a practical exercise. In doing so, we rely on pretrained transformer models which can be fine-tuned to a specific task and domain. The first proposed application of this model class is to use it as a feature extractor, allowing for the transformation of written text into real-valued numerical vectors (called ’embeddings’) that capture a text’s semantic meaning. These vectors can, in turn, be used as input for a subsequent machine-learning model. The second presented application of transformer models is the end-to-end training (so-called ‘fine-tuning’) of the model. This results in a direct prediction of the label within the same model that directly maps the text to the embeddings. While in the second case, results are usually better and training works more seamlessly, the model itself is often not directly interpretable. We showcase an alleviation of this issue via the application of post-hoc interpretability methods by calculating SHAP values and applying local interpretable model-agnostic explanations (LIME) in an attempt to explain the model’s inner workings.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[898]

K. Hechinger, C. Koller, X. Zhu and G. Kauermann.
Human-in-the-loop: Towards Label Embeddings for Measuring Classification Difficulty.
Preprint (May. 2024). arXiv

Abstract

Uncertainty in machine learning models is a timely and vast field of research. In supervised learning, uncertainty can already occur in the first stage of the training process, the annotation phase. This scenario is particularly evident when some instances cannot be definitively classified. In other words, there is inevitable ambiguity in the annotation step and hence, not necessarily a ‘ground truth’ associated with each instance. The main idea of this work is to drop the assumption of a ground truth label and instead embed the annotations into a multidimensional space. This embedding is derived from the empirical distribution of annotations in a Bayesian setup, modeled via a Dirichlet-Multinomial framework. We estimate the model parameters and posteriors using a stochastic Expectation Maximization algorithm with Markov Chain Monte Carlo steps. The methods developed in this paper readily extend to various situations where multiple annotators independently label instances. To showcase the generality of the proposed approach, we apply our approach to three benchmark datasets for image classification and Natural Language Inference. Besides the embeddings, we can investigate the resulting correlation matrices, which reflect the semantic similarities of the original classes very well for all three exemplary datasets.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Data Science in Earth Observation

[897]

K. Heß, D. Frauen, V. Melnychuk and S. Feuerriegel.
G-Transformer for Conditional Average Potential Outcome Estimation over Time.
Preprint (May. 2024). arXiv

Abstract

Estimating potential outcomes for treatments over time based on observational data is important for personalized decision-making in medicine. Yet, existing neural methods for this task either (1) do not perform proper adjustments for time-varying confounders, or (2) suffer from large estimation variance. In order to address both limitations, we introduce the G-transformer (GT). Our GT is a novel, neural end-to-end model which adjusts for time-varying confounders, and provides low-variance estimation of conditional average potential outcomes (CAPOs) over time. Specifically, our GT is the first neural model to perform regression-based iterative G-computation for CAPOs in the time-varying setting. We evaluate the effectiveness of our GT across various experiments. In sum, this work represents a significant step towards personalized decision-making from electronic health records.

MCML Authors

Konstantin Heß

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[896]

C. Kern, M. Kim and A. Zhou.
Multi-CATE: Multi-Accurate Conditional Average Treatment Effect Estimation Robust to Unknown Covariate Shifts.
Preprint (May. 2024). arXiv

Abstract

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[895]

X. Zhu, Z. Xiong, Y. Wang, A. Stewart, K. Heidler, Y. Wang, Z. Yuan, T. Dujardin, Q. Xu and Y. Shi.
On the Foundations of Earth and Climate Foundation Models.
Preprint (May. 2024). arXiv

Abstract

Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an environmental- and human-centric this http URL further shed light on the way forward to achieve the ideal model and to evaluate Earth foundation models. What comes after foundation models? Energy efficient adaptation, adversarial defenses, and interpretability are among the emerging directions.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Adam Stewart

Dr.

Data Science in Earth Observation

Qingsong Xu

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[894]

N. Strauß and M. Schubert.
Spatial-Aware Deep Reinforcement Learning for the Traveling Officer Problem.
SDM 2024 - SIAM International Conference on Data Mining. Houston, TX, USA, Apr 18-20, 2024. DOI

Abstract

The traveling officer problem (TOP) is a challenging stochastic optimization task. In this problem, a parking officer is guided through a city equipped with parking sensors to fine as many parking offenders as possible. A major challenge in TOP is the dynamic nature of parking offenses, which randomly appear and disappear after some time, regardless of whether they have been fined. Thus, solutions need to dynamically adjust to currently fineable parking offenses while also planning ahead to increase the likelihood that the officer arrives during the offense taking place. Though various solutions exist, these methods often struggle to take the implications of actions on the ability to fine future parking violations into account. This paper proposes SATOP, a novel spatial-aware deep reinforcement learning approach for TOP. Our novel state encoder creates a representation of each action, leveraging the spatial relationships between parking spots, the agent, and the action. Furthermore, we propose a novel message-passing module for learning future inter-action correlations in the given environment. Thus, the agent can estimate the potential to fine further parking violations after executing an action. We evaluate our method using an environment based on real-world data from Melbourne. Our results show that SATOP consistently outperforms state-of-the-art TOP agents and is able to fine up to 22% more parking offenses.

MCML Authors

Niklas Strauß

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[893]

C. Leiber.
Clustering in transformed feature spaces by analyzing distinct modes.
Dissertation 2024. DOI

Abstract

The growing availability of data demands clustering methods that can extract valuable information without requiring costly annotations, especially for large, high-dimensional datasets. This dissertation develops subspace and deep clustering approaches, leveraging methods like the Dip-test of unimodality and Minimum Description Length principle to identify and encode relevant features and clusters automatically, even in complex datasets. By incorporating these techniques into neural networks and refining them through a novel parameter-free approach, the research offers robust clustering tools that perform well without prior knowledge of the number of clusters, all implemented in the open-source package ClustPy. (Shortened).

MCML Authors

Collin Leiber

* Former Member

[892]

J. Eudaric, A. Camero, K. Rafiezadeh Shahi, H. Kreibich, S. Martinis and X. Zhu.
Rapid unsupervised economic assessment of urban flood damage using SAR images.
EGU 2024 - General Assembly of the European Geosciences Union. Vienna, Austria, Apr 14-19, 2024. DOI

Abstract

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[891]

Y. Mu, M. Shahzad and X. Zhu.
A spectral-spatial-temporal attention network for tree species mapping using DESIS hyperspectral imagery.
EGU 2024 - General Assembly of the European Geosciences Union. Vienna, Austria, Apr 14-19, 2024. DOI

Abstract

Accurate mapping and monitoring of forest tree species are crucial for understanding ecosystem dynamics [1], assessing biodiversity [2], and enabling sustainable forest management [3]. Tree species adapt their morphology and phenology to the environment [4], leading to variability in spectral signatures across geographic regions. Furthermore, the spectral reflectance of a given tree species varies significantly with growth stages and seasons [5], making the classification based solely on RGB data extremely challenging. At the local level, spectral variability also closely correlates with stand structure factors such as crown size, stand density, and gap sizes. This results in varying signal reflectance from different parts of the same crown, further complicating tree species classification [6]. Thus, we proposed a spectral-spatial-temporal constrained deep learning method, an end-to-end multi-head attention-based network, to automatically extract deep features for tree species mapping. Employing this model on multi-temporal hyperspectral imagery from the DLR Earth Sensing Imaging Spectrometer (DESIS), we produced a 30 m resolution forest species distribution map of the Harz Forest in Germany. DESIS, a VNIR sensor aboard the International Space Station, captures detailed Earth images upon request, offering extensive spectral data across 235 bands ranging from 400 to 1000 nm [7]. Our methodology leverages the comprehensive spectral information provided by DESIS, enhancing the tree species mapping accuracy. Utilizing the reference data from TreeSatAI Benchmark Archive [8], we prepared 134,886 hyperspectral data patches, each labelled with tree species information. The evaluation involved assessing the F1-score, Jaccard index, Hamming loss, and accuracy for various tree species using National Forest Inventory (NFI) data plots. The results reveal the potential of deep learning using hyperspectral data in the precise and automated mapping of forest tree species distribution, thereby supporting evidence-based decision-making in sustainable forest management.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[890]

P. Dettling, M. Drton and M. Kolar.
On the Lasso for Graphical Continuous Lyapunov Models.
CLeaR 2024 - 3rd Conference on Causal Learning and Reasoning. Los Angeles, CA, USA, Apr 01-03, 2024. URL

Abstract

Graphical continuous Lyapunov models offer a new perspective on modeling causally interpretable dependence structure in multivariate data by treating each independent observation as a one-time cross-sectional snapshot of a temporal process. Specifically, the models assume that the observations are cross-sections of independent multivariate Ornstein-Uhlenbeck processes in equilibrium. The Gaussian equilibrium exists under a stability assumption on the drift matrix, and the equilibrium covariance matrix is determined by the continuous Lyapunov equation. Each graphical continuous Lyapunov model assumes the drift matrix to be sparse, with a support determined by a directed graph. A natural approach to model selection in this setting is to use an ℓ1-regularization technique that, based on a given sample covariance matrix, seeks to find a sparse approximate solution to the Lyapunov equation. We study the model selection properties of the resulting lasso technique to arrive at a consistency result. Our detailed analysis reveals that the involved irrepresentability condition is surprisingly difficult to satisfy. While this may prevent asymptotic consistency in model selection, our numerical experiments indicate that even if the theoretical requirements for consistency are not met, the lasso approach is able to recover relevant structure of the drift matrix and is robust to aspects of model misspecification.

MCML Authors

Mathias Drton

Prof. Dr.

Mathematical Statistics

[889]

K. Göbler, T. Windisch, M. Drton, T. Pychynski, M. Roth and S. Sonntag.
causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery.
CLeaR 2024 - 3rd Conference on Causal Learning and Reasoning. Los Angeles, CA, USA, Apr 01-03, 2024. URL

Abstract

Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real and complex data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To tackle these challenges, we introduce causalAssembly, a semisynthetic data generator designed to facilitate the benchmarking of causal discovery methods. The tool is built using a complex real-world dataset comprised of measurements collected along an assembly line in a manufacturing setting. For these measurements, we establish a partial set of ground truth causal relationships through a detailed study of the physics underlying the processes carried out in the assembly line. The partial ground truth is sufficiently informative to allow for estimation of a full causal graph by mere nonparametric regression. To overcome potential confounding and privacy concerns, we use distributional random forests to estimate and represent conditional distributions implied by the ground truth causal graph. These conditionals are combined into a joint distribution that strictly adheres to a causal model over the observed variables. Sampling from this distribution, causalAssembly generates data that are guaranteed to be Markovian with respect to the ground truth. Using our tool, we showcase how to benchmark several well-known causal discovery algorithms.

MCML Authors

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

[888]

D. Strieder and M. Drton.
Dual Likelihood for Causal Inference under Structure Uncertainty.
CLeaR 2024 - 3rd Conference on Causal Learning and Reasoning. Los Angeles, CA, USA, Apr 01-03, 2024. URL

Abstract

Knowledge of the underlying causal relations is essential for inferring the effect of interventions in complex systems. In a widely studied approach, structural causal models postulate noisy functional relations among interacting variables, where the underlying causal structure is then naturally represented by a directed graph whose edges indicate direct causal dependencies. In the typical application, this underlying causal structure must be learned from data, and thus, the remaining structure uncertainty needs to be incorporated into causal inference in order to draw reliable conclusions. In recent work, test inversions provide an ansatz to account for this data-driven model choice and, therefore, combine structure learning with causal inference. In this article, we propose the use of dual likelihood to greatly simplify the treatment of the involved testing problem. Indeed, dual likelihood leads to a closed-form solution for constructing confidence regions for total causal effects that rigorously capture both sources of uncertainty: causal structure and numerical size of nonzero effects. The proposed confidence regions can be computed with a bottom-up procedure starting from sink nodes. To render the causal structure identifiable, we develop our ideas in the context of linear causal relations with equal error variances.

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[887]

H. A. Gündüz, R. Mreches, J. Moosbauer, G. Robertson, X.-Y. To, E. A. Franzosa, C. Huttenhower, M. Rezaei, A. C. McHardy, B. Bischl, P. C. Münch and M. Binder.
Optimized model architectures for deep learning on genomic data.
Communications Biology 7.1 (Apr. 2024). DOI

Abstract

The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines.

MCML Authors

Hüseyin Anil Gündüz

* Former Member

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Xiao-Yin To

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

Statistical Learning and Data Science

[886]

M. Herrmann, D. Kazempour, F. Scheipl and P. Kröger.
Enhancing cluster analysis via topological manifold learning.
Data Mining and Knowledge Discovery 38 (Apr. 2024). DOI

Abstract

We discuss topological aspects of cluster analysis and show that inferring the topological structure of a dataset before clustering it can considerably enhance cluster detection: we show that clustering embedding vectors representing the inherent structure of a dataset instead of the observed feature vectors themselves is highly beneficial. To demonstrate, we combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN. Synthetic and real data results show that this both simplifies and improves clustering in a diverse set of low- and high-dimensional problems including clusters of varying density and/or entangled shapes. Our approach simplifies clustering because topological pre-processing consistently reduces parameter sensitivity of DBSCAN. Clustering the resulting embeddings with DBSCAN can then even outperform complex methods such as SPECTACL and ClusterGAN. Finally, our investigation suggests that the crucial issue in clustering does not appear to be the nominal dimension of the data or how many irrelevant features it contains, but rather how separable the clusters are in the ambient observation space they are embedded in, which is usually the (high-dimensional) Euclidean space defined by the features of the data. The approach is successful because it performs the cluster analysis after projecting the data into a more suitable space that is optimized for separability, in some sense.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Fabian Scheipl

PD Dr.

Functional Data Analysis

[885]

C. Koller, P. Jung and X. Zhu.
Can Land Cover Classification Models Benefit From Distance-Aware Architectures?
IEEE Geoscience and Remote Sensing Magazine 21 (Apr. 2024). DOI GitHub

Abstract

The quantification of predictive uncertainties helps to understand where the existing models struggle to find the correct prediction. A useful quality control tool is the task of detecting out-of-distribution (OOD) data by examining the model’s predictive uncertainty. For this task, deterministic single forward pass frameworks have recently been established as deep learning models and have shown competitive performance in certain tasks. The unique combination of spectrally normalized weight matrices and residual connection networks with an approximate Gaussian process (GP) output layer can here offer the best trade-off between performance and complexity. We utilize this framework with a refined version that adds spectral batch normalization and an inducing points approximation of the GP for the task of OOD detection in remote sensing image classification. This is an important task in the field of remote sensing, because it provides an evaluation of how reliable the model’s predictive uncertainty estimates are. By performing experiments on the benchmark datasets Eurosat and So2Sat LCZ42, we can show the effectiveness of the proposed adaptions to the residual networks (ResNets). Depending on the chosen dataset, the proposed methodology achieves OOD detection performance up to 16% higher than previously considered distance-aware networks. Compared with other uncertainty quantification methodologies, the results are on the same level and exceed them in certain experiments by up to 2%. In particular, spectral batch normalization, which normalizes the batched data as opposed to normalizing the network weights by the spectral normalization (SN), plays a crucial role and leads to performance gains of up to 3% in every single experiment.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[884]

X. Li, C. Wen, Y. Hu, Z. Yuan and X. Zhu.
Vision-Language Models in Remote Sensing: Current progress and future trends.
IEEE Geoscience and Remote Sensing Magazine 62 (Apr. 2024). DOI

Abstract

The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-4) have sparked a wave of interest and research in the field of large language models (LLMs) for artificial general intelligence (AGI). These models provide intelligent solutions that are closer to human thinking, enabling us to use general artificial intelligence (AI) to solve problems in various applications. However, in the field of remote sensing (RS), the scientific literature on the implementation of AGI remains relatively scant. Existing AI-related research in RS focuses primarily on visual-understanding tasks while neglecting the semantic understanding of the objects and their relationships. This is where vision-LMs (VLMs) excel as they enable reasoning about images and their associated textual descriptions, allowing for a deeper understanding of the underlying semantics. VLMs can go beyond visual recognition of RS images and can model semantic relationships as well as generate natural language descriptions of the image. This makes them better suited for tasks that require both visual and textual understanding, such as image captioning and visual question answering (VQA). This article provides a comprehensive review of the research on VLMs in RS, summarizing the latest progress, highlighting current challenges, and identifying potential research opportunities. Specifically, we review the application of VLMs in mainstream RS tasks, including image captioning, text-based image generation, text-based image retrieval (TBIR), VQA, scene classification, semantic segmentation, and object detection. For each task, we analyze representative works and discuss research progress. Finally, we summarize the limitations of existing works and provide possible directions for future development. This review aims to provide a comprehensive overview of the current research progress of VLMs in RS (see Figure 1 ), and to inspire further research in this exciting and promising field.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[883]

K. Qian, Y. Wang, P. Jung, Y. Shi and X. Zhu.
HyperLISTA-ABT: An Ultralight Unfolded Network for Accurate Multicomponent Differential Tomographic SAR Inversion.
IEEE Transactions on Geoscience and Remote Sensing 62 (Apr. 2024). DOI

Abstract

Deep neural networks based on unrolled iterative algorithms have achieved remarkable success in sparse reconstruction applications, such as synthetic aperture radar (SAR) tomographic inversion (TomoSAR). However, the currently available deep learning-based TomoSAR algorithms are limited to 3-D reconstruction. The extension of deep learning-based algorithms to 4-D imaging, i.e., differential TomoSAR (D-TomoSAR) applications, is impeded mainly due to the high-dimensional weight matrices required by the network designed for D-TomoSAR inversion, which typically contain millions of freely trainable parameters. Learning such huge number of weights requires an enormous number of training samples, resulting in a large memory burden and excessive time consumption. To tackle this issue, we propose an efficient and accurate algorithm called HyperLISTA-ABT. The weights in HyperLISTA-ABT are determined in an analytical way according to a minimum coherence criterion, trimming the model down to an ultra-light one with only three hyperparameters. Additionally, HyperLISTA-ABT improves the global thresholding by utilizing an adaptive blockwise thresholding (ABT) scheme, which applies block-coordinate techniques and conducts thresholding in local blocks, so that weak expressions and local features can be retained in the shrinkage step layer by layer. Simulations were performed and demonstrated the effectiveness of our approach, showing that HyperLISTA-ABT achieves superior computational efficiency with no significant performance degradation compared to the state-of-the-art methods. Real data experiments showed that a high-quality 4-D point cloud could be reconstructed over a large area by the proposed HyperLISTA-ABT with affordable computational resources and in a fast time.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[882]

Y. Lee, H. Boche and G. Kutyniok.
Computability of Optimizers.
IEEE Transactions on Information Theory 70.4 (Apr. 2024). DOI

Abstract

Optimization problems are a staple of today’s scientific and technical landscape. However, at present, solvers of such problems are almost exclusively run on digital hardware. Using Turing machines as a mathematical model for any type of digital hardware, in this paper, we analyze fundamental limitations of this conceptual approach of solving optimization problems. Since in most applications, the optimizer itself is of significantly more interest than the optimal value of the corresponding function, we will focus on computability of the optimizer. In fact, we will show that in various situations the optimizer is unattainable on Turing machines and consequently on digital computers. Moreover, even worse, there does not exist a Turing machine, which approximates the optimizer itself up to a certain constant error. We prove such results for a variety of well-known problems from very different areas, including artificial intelligence, financial mathematics, and information theory, often deriving the even stronger result that such problems are not Banach-Mazur computable, also not even in an approximate sense.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[881]

I. M. Grigore, G. M. Tavares, M. C. Silva, P. Ceravolo and S. Junior.
Automated Trace Clustering Pipeline Synthesis in Process Mining.
Information 15.4 (Apr. 2024). DOI

Abstract

Business processes have undergone a significant transformation with the advent of the process-oriented view in organizations. The increasing complexity of business processes and the abundance of event data have driven the development and widespread adoption of process mining techniques. However, the size and noise of event logs pose challenges that require careful analysis. The inclusion of different sets of behaviors within the same business process further complicates data representation, highlighting the continued need for innovative solutions in the evolving field of process mining. Trace clustering is emerging as a solution to improve the interpretation of underlying business processes. Trace clustering offers benefits such as mitigating the impact of outliers, providing valuable insights, reducing data dimensionality, and serving as a preprocessing step in robust pipelines. However, designing an appropriate clustering pipeline can be challenging for non-experts due to the complexity of the process and the number of steps involved. For experts, it can be time-consuming and costly, requiring careful consideration of trade-offs. To address the challenge of pipeline creation, the paper proposes a genetic programming solution for trace clustering pipeline synthesis that optimizes a multi-objective function matching clustering and process quality metrics. The solution is applied to real event logs, and the results demonstrate improved performance in downstream tasks through the identification of sub-logs.

MCML Authors

Gabriel Marques Tavares

Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Database Systems and Data Mining

[880]

S. Doda, M. Kahl, K. Ouan, I. Obadic, Y. Wang, H. Taubenböck and X. Zhu.
Interpretable deep learning for consistent large-scale urban population estimation using Earth observation data.
International Journal of Applied Earth Observation and Geoinformation 128 (Apr. 2024). DOI

Abstract

Accurate and up-to-date mapping of the human population is fundamental for a wide range of disciplines, from effective governance and establishing policies to disaster management and crisis dilution. The traditional method of gathering population data through census is costly and time-consuming. Recently, with the availability of large amounts of Earth observation data sets, deep learning methods have been explored for population estimation; however, they are either limited by census data availability, inter-regional evaluations, or transparency. In this paper, we present an end-to-end interpretable deep learning framework for large-scale population estimation at a resolution of 1 km that uses only the publicly available data sets and does not rely on census data for inference. The architecture is based on a modification of the common ResNet-50 architecture tailored to analyze both image-like and vector-like data. Our best model outperforms the baseline random forest model by improving the RMSE by around 9% and also surpasses the community standard product, GHS-POP, thus yielding promising results. Furthermore, we improve the transparency of the proposed model by employing an explainable AI technique that identified land use information to be the most relevant feature for population estimation. We expect the improved interpretation of the model outcome will inspire both academic and non-academic end users, particularly those investigating urbanization or sub-urbanization trends, to have confidence in the deep learning methods for population estimation.

MCML Authors

Matthias Kahl

Dr.

Data Science in Earth Observation

Ivica Obadic

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[879]

J. Guo, D. Hong and X. Zhu.
High-resolution satellite images reveal the prevalent positive indirect impact of urbanization on urban tree canopy coverage in South America.
Landscape and Urban Planning 247 (Apr. 2024). DOI

Abstract

Trees in urban areas act as carbon sinks and provide ecosystem services for residents. However, the impact of urbanization on tree coverage in South America remains poorly understood. Here, we make use of very high resolution satellite imagery to derive urban tree coverage for 882 cities in South America and developed a tree coverage impacted (TCI) coefficient to quantify the direct and indirect impacts of urbanization on urban tree canopy (UTC) coverage. The direct effect refers to the change in tree cover due to the rise in urban intensity compared to scenarios with extremely low levels of urbanization, while the indirect impact refers to the change in tree coverage resulting from human management practices and alterations in urban environments. Our study revealed the negative direct impacts and prevalent positive indirect impacts of urbanization on UTC coverage. In South America, 841 cities exhibit positive indirect impacts, while only 41 cities show negative indirect impacts. The prevalent positive indirect effects can offset approximately 48% of the direct loss of tree coverage due to increased urban intensity, with full offsets achieved in Argentinian and arid regions of South America. In addition, human activity factors play the most important role in determining the indirect effects of urbanization on UTC coverage, followed by climatic and geographic factors. These findings will help us understand the impact of urbanization on UTC coverage along the urban intensity gradient and formulate policies and strategies to promote sustainable urban development in South America.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[878]

S. Feuerriegel, D. Frauen, V. Melnychuk, J. Schweisthal, K. Heß, A. Curth, S. Bauer, N. Kilbertus, I. S. Kohane and M. van der Schaar.
Causal machine learning for predicting treatment outcomes.
Nature Medicine 30 (Apr. 2024). DOI

Abstract

Causal machine learning (ML) offers flexible, data-driven methods for predicting treatment outcomes including efficacy and toxicity, thereby supporting the assessment and safety of drugs. A key benefit of causal ML is that it allows for estimating individualized treatment effects, so that clinical decision-making can be personalized to individual patient profiles. Causal ML can be used in combination with both clinical trial data and real-world data, such as clinical registries and electronic health records, but caution is needed to avoid biased or incorrect predictions. In this Perspective, we discuss the benefits of causal ML (relative to traditional statistical or ML approaches) and outline the key components and steps. Finally, we provide recommendations for the reliable use of causal ML and effective translation into the clinic.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[877]

A. Triantafyllopoulos and B. W. Schuller.
Expressivity and Speech Synthesis.
Oxford Handbook of Expressivity in Language (Apr. 2024). arXiv URL

Abstract

Imbuing machines with the ability to talk has been a longtime pursuit of artificial intelligence (AI) research. From the very beginning, the community has not only aimed to synthesise high-fidelity speech that accurately conveys the semantic meaning of an utterance, but also to colour it with inflections that cover the same range of affective expressions that humans are capable of. After many years of research, it appears that we are on the cusp of achieving this when it comes to single, isolated utterances. This unveils an abundance of potential avenues to explore when it comes to combining these single utterances with the aim of synthesising more complex, longer-term behaviours. In the present chapter, we outline the methodological advances that brought us so far and sketch out the ongoing efforts to reach that coveted next level of artificial expressivity. We also discuss the societal implications coupled with rapidly advancing expressive speech synthesis (ESS) technology and highlight ways to mitigate those risks and ensure the alignment of ESS capabilities with ethical norms.

MCML Authors

Andreas Triantafyllopoulos

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[876]

G. S. Collins, K. G. M. Moons, P. Dhiman, R. D. Riley, A. L. Beam, B. Van Calster, M. Ghassemi, X. Liu, J. B. Reitsma, M. van Smeden, A.-L. Boulesteix, J. C. Camaradou, L. A. Celi, S. Denaxas, A. K. Denniston, B. Glocker, R. M. Golub, H. Harvey, G. Heinze, M. M. Hoffman, A. P. Kengne, E. Lam, N. Lee, E. W. Loder, L. Maier-Hein, B. A. Mateen, M. D. McCradden, L. Oakden-Rayner, J. Ordish, R. Parnell, S. Rose, K. Singh, L. Wynants and P. Logullo.
TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods.
The BMJ 385.e078378 (Apr. 2024). DOI

Abstract

The TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement was published in 2015 to provide the minimum reporting recommendations for studies developing or evaluating the performance of a prediction model. Methodological advances in the field of prediction have since included the widespread use of artificial intelligence (AI) powered by machine learning methods to develop prediction models. An update to the TRIPOD statement is thus needed. TRIPOD+AI provides harmonised guidance for reporting prediction model studies, irrespective of whether regression modelling or machine learning methods have been used. The new checklist supersedes the TRIPOD 2015 checklist, which should no longer be used. This article describes the development of TRIPOD+AI and presents the expanded 27 item checklist with more detailed explanation of each reporting recommendation, and the TRIPOD+AI for Abstracts checklist. TRIPOD+AI aims to promote the complete, accurate, and transparent reporting of studies that develop a prediction model or evaluate its performance. Complete reporting will facilitate study appraisal, model evaluation, and model implementation.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[875]

V. Gkolemis, C. Diou, E. Ntoutsi, T. Dalamagas, B. Bischl, J. Herbinger and G. Casalicchio.
Effector: A Python package for regional explanations.
Preprint (Apr. 2024). arXiv GitHub

Abstract

Global feature effect methods explain a model outputting one plot per feature. The plot shows the average effect of the feature on the output, like the effect of age on the annual income. However, average effects may be misleading when derived from local effects that are heterogeneous, i.e., they significantly deviate from the average. To decrease the heterogeneity, regional effects provide multiple plots per feature, each representing the average effect within a specific subspace. For interpretability, subspaces are defined as hyperrectangles defined by a chain of logical rules, like age’s effect on annual income separately for males and females and different levels of professional experience. We introduce Effector, a Python library dedicated to regional feature effects. Effector implements well-established global effect methods, assesses the heterogeneity of each method and, based on that, provides regional effects. Effector automatically detects subspaces where regional effects have reduced heterogeneity. All global and regional effect methods share a common API, facilitating comparisons between them. Moreover, the library’s interface is extensible so new methods can be easily added and benchmarked.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Julia Herbinger

Dr.

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[874]

P. Hofman, Y. Sale and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty with Proper Scoring Rules.
Preprint (Apr. 2024). arXiv

Abstract

Uncertainty representation and quantification are paramount in machine learning and constitute an important prerequisite for safety-critical applications. In this paper, we propose novel measures for the quantification of aleatoric and epistemic uncertainty based on proper scoring rules, which are loss functions with the meaningful property that they incentivize the learner to predict ground-truth (conditional) probabilities. We assume two common representations of (epistemic) uncertainty, namely, in terms of a credal set, i.e. a set of probability distributions, or a second-order distribution, i.e., a distribution over probability distributions. Our framework establishes a natural bridge between these representations. We provide a formal justification of our approach and introduce new measures of epistemic and aleatoric uncertainty as concrete instantiations.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence and Machine Learning

[873]

P. Lin, S. Ji, J. Tiedemann, A. F. T. Martins and H. Schütze.
MaLA-500: Massive Language Adaptation of Large Language Models.
Preprint (Apr. 2024). arXiv GitHub

Abstract

Large language models (LLMs) have advanced the state of the art in natural language processing. However, their predominant design for English or a limited set of languages creates a substantial gap in their effectiveness for low-resource languages. To bridge this gap, we introduce MaLA-500, a novel large language model designed to cover an extensive range of 534 languages. To train MaLA-500, we employ vocabulary extension and continued pretraining on LLaMA 2 with Glot500-c. Our intrinsic evaluation demonstrates that MaLA-500 is better at predicting the given texts of low-resource languages than existing multilingual LLMs. Moreover, the extrinsic evaluation of in-context learning shows that MaLA-500 outperforms previous LLMs on SIB200 and Taxi1500 by a significant margin, i.e., 11.68% and 4.82% marco-average accuracy across languages.

MCML Authors

Peiqin Lin

Computational Linguistics

Hinrich Schütze

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Computational Linguistics

[872]

Y. Mansour and R. Heckel.
GAMA-IR: Global Additive Multidimensional Averaging for Fast Image Restoration.
Preprint (Apr. 2024). arXiv

Abstract

Deep learning-based methods have shown remarkable success for various image restoration tasks such as denoising and deblurring. The current state-of-the-art networks are relatively deep and utilize (variants of) self attention mechanisms. Those networks are significantly slower than shallow convolutional networks, which however perform worse. In this paper, we introduce an image restoration network that is both fast and yields excellent image quality. The network is designed to minimize the latency and memory consumption when executed on a standard GPU, while maintaining state-of-the-art performance. The network is a simple shallow network with an efficient block that implements global additive multidimensional averaging operations. This block can capture global information and enable a large receptive field even when used in shallow networks with minimal computational overhead. Through extensive experiments and evaluations on diverse tasks, we demonstrate that our network achieves comparable or even superior results to existing state-of-the-art image restoration networks with less latency. For instance, we exceed the state-of-the-art result on real-world SIDD denoising by 0.11dB, while being 2 to 10 times faster.

MCML Authors

Youssef Mansour

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

[871]

Y. Mu, J. Guo, M. Shahzad and X. Zhu.
Nationwide Tree Species Mapping in Germany with Forestformer: Assessing Forest Resilience and Unveiling Distribution Patterns.
Preprint (Apr. 2024). DOI

Abstract

Accurate information on tree species distribution is crucial for biodiversity assessment, effective forest management, and evidence-informed environmental policy-making. However, achieving high-resolution discrimination of tree species over large areas is challenging, especially in heterogeneous forest ecosystems where multiple species coexist, leading to spectral mixing and spatial complexity. To overcome these challenges, we developed a novel ForestFormer model using Sentinel-2 time series data for mapping eight dominant tree species (Beech, Oak, Other deciduous, Larch, Spruce, Pine, Fir, and Douglas fir) in Germany at 10 m resolution. ForestFormer employs a dual-branch network with attention modules in both spectral and spatial domains, enhancing classification accuracy effectively by highlighting key spectral and spatial characteristics unique to individual species. Cross validation on 9,456 National Forest Inventory (NFI) data plots indicates that the proposed ForestFormer achieves an overall average accuracy of 83.94%, outperforming several state-of-the-art methods. The developed ForestFormer model can aid in generating high-resolution tree species distribution maps for Germany, which in turn can provide crucial insights into the diverse characteristics of tree species. For instance, our analysis of results shows that the Pine is the species most resilient to disturbances, while Douglas fir is the least. Northeastern regions of Germany exhibit particularly low levels of biodiversity, especially in the states of Brandenburg and Berlin, followed by neighboring states such as Sachsen-Anhalt, Mecklenburg-Vorpommern, Sachsen, and Niedersachsen. In addition, climatic, topographic, and soil factors are shown to play a very important role in determining tree species distribution patterns, followed by human activity factors. These findings are anticipated to provide a critical basis for environmental policy formulation, particularly in forest management strategies responding to ongoing climate change.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Data Science in Earth Observation

[870]

L. Rottkamp and M. Schubert.
A Time-Inhomogeneous Markov Model for Resource Availability under Sparse Observations.
Preprint (Apr. 2024). arXiv

Abstract

Accurate spatio-temporal information about the current situation is crucial for smart city applications such as modern routing algorithms. Often, this information describes the state of stationary resources, e.g. the availability of parking bays, charging stations or the amount of people waiting for a vehicle to pick them up near a given location. To exploit this kind of information, predicting future states of the monitored resources is often mandatory because a resource might change its state within the time until it is needed. To train an accurate predictive model, it is often not possible to obtain a continuous time series on the state of the resource. For example, the information might be collected from traveling agents visiting the resource with an irregular frequency. Thus, it is necessary to develop methods which work on sparse observations for training and prediction. In this paper, we propose time-inhomogeneous discrete Markov models to allow accurate prediction even when the frequency of observation is very rare. Our new model is able to blend recent observations with historic data and also provide useful probabilistic estimates for future states. Since resources availability in a city is typically time-dependent, our Markov model is time-inhomogeneous and cyclic within a predefined time interval. To train our model, we propose a modified Baum-Welch algorithm. Evaluations on real-world datasets of parking bay availability show that our new method indeed yields good results compared to methods being trained on complete data and non-cyclic variants.

MCML Authors

Lukas Rottkamp

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Spatial Artificial Intelligence

[869]

M. Wünsch, M. Herrmann, E. Noltenius, M. Mohr, T. P. Morris and A.-L. Boulesteix.
On the handling of method failure in comparison studies.
Preprint (Apr. 2024). arXiv

Abstract

Comparison studies in methodological research are intended to compare methods in an evidence-based manner, offering guidance to data analysts to select a suitable method for their application. To provide trustworthy evidence, they must be carefully designed, implemented, and reported, especially given the many decisions made in planning and running. A common challenge in comparison studies is to handle the ``failure’’ of one or more methods to produce a result for some (real or simulated) data sets, such that their performances cannot be measured in those instances. Despite an increasing emphasis on this topic in recent literature (focusing on non-convergence as a common manifestation), there is little guidance on proper handling and interpretation, and reporting of the chosen approach is often neglected. This paper aims to fill this gap and provides practical guidance for handling method failure in comparison studies. In particular, we show that the popular approaches of discarding data sets yielding failure (either for all or the failing methods only) and imputing are inappropriate in most cases. We also discuss how method failure in published comparison studies – in various contexts from classical statistics and predictive modeling – may manifest differently, but is often caused by a complex interplay of several aspects. Building on this, we provide recommendations derived from realistic considerations on suitable fallbacks when encountering method failure, hence avoiding the need for discarding data sets or imputation. Finally, we illustrate our recommendations and the dangers of inadequate handling of method failure through two illustrative comparison studies.

MCML Authors

Milena Wünsch

Biometry in Molecular Medicine

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[868]

Z. Xiong, F. Zhang, Y. Wang, Y. Shi and X. Zhu.
EarthNets: Empowering AI in Earth Observation.
Preprint (Apr. 2024). arXiv GitHub

Abstract

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[867]

X. Zhu, Q. Li, Y. Shi, Y. Wang, A. Stewart and J. Prexl.
GlobalBuildingMap -- Unveiling the Mystery of Global Buildings.
Preprint (Apr. 2024). arXiv

Abstract

Understanding how buildings are distributed globally is crucial to revealing the human footprint on our home planet. This built environment affects local climate, land surface albedo, resource distribution, and many other key factors that influence well-being and human health. Despite this, quantitative and comprehensive data on the distribution and properties of buildings worldwide is lacking. To this end, by using a big data analytics approach and nearly 800,000 satellite images, we generated the highest resolution and highest accuracy building map ever created: the GlobalBuildingMap (GBM). A joint analysis of building maps and solar potentials indicates that rooftop solar energy can supply the global energy consumption need at a reasonable cost. Specifically, if solar panels were placed on the roofs of all buildings, they could supply 1.1-3.3 times – depending on the efficiency of the solar device – the global energy consumption in 2020, which is the year with the highest consumption on record. We also identified a clear geospatial correlation between building areas and key socioeconomic variables, which indicates our global building map can serve as an important input to modeling global socioeconomic needs and drivers.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Adam Stewart

Dr.

Data Science in Earth Observation

[866]

A. Maronikolakis, A. Köksal and H. Schütze.
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks.
LT-EDI 2024 - 4th Workshop on Language Technology for Equality, Diversity, Inclusion. St. Julian’s, Malta, Mar 21, 2024. URL

Abstract

We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for Brazil, Germany, India and Kenya, to aid model development and interpretability. First, we demonstrate how HATELEXICON can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target group names. Further, we propose a culturally-informed method to aid shot selection for training in low-resource settings. In few-shot learning, shot selection is of paramount importance to model performance and we need to ensure we make the most of available data. We work with HASOC German and Hindi data for training and the Multilingual HateCheck (MHC) benchmark for evaluation. We show that selecting shots based on our lexicon leads to models performing better than models trained on shots sampled randomly. Thus, when given only a few training examples, using HATELEXICON to select shots containing more sociocultural information leads to better few-shot performance. With these two use-cases we show how our HATELEXICON can be used for more effective hate speech detection.

MCML Authors

Antonis Maronikolakis

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[865]

C. Gruber, K. Hechinger, M. Aßenmacher, G. Kauermann and B. Plank.
More Labels or Cases? Assessing Label Variation in Natural Language Inference.
UnImplicit 2024 - 3rd Workshop on Understanding Implicit and Underspecified Language. Malta, Mar 21, 2024. URL

Abstract

In this work, we analyze the uncertainty that is inherently present in the labels used for supervised machine learning in natural language inference (NLI). In cases where multiple annotations per instance are available, neither the majority vote nor the frequency of individual class votes is a trustworthy representation of the labeling uncertainty. We propose modeling the votes via a Bayesian mixture model to recover the data-generating process, i.e., the “true” latent classes, and thus gain insight into the class variations. This will enable a better understanding of the confusion happening during the annotation process. We also assess the stability of the proposed estimation procedure by systematically varying the numbers of i) instances and ii) labels. Thereby, we observe that few instances with many labels can predict the latent class borders reasonably well, while the estimation fails for many instances with only a few labels. This leads us to conclude that multiple labels are a crucial building block for properly analyzing label uncertainty.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[864]

S. Peng, Z. Sun, S. Loftus and B. Plank.
Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations.
UnImplicit 2024 - 3rd Workshop on Understanding Implicit and Underspecified Language. Malta, Mar 21, 2024. URL

Abstract

Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition. While recent studies address and aim to correct annotation errors via re-labeling efforts, little is known about the sources of human label variation, such as text ambiguity, annotation error, or guideline divergence. This is especially the case for high-quality datasets and beyond English CoNLL03. This paper studies disagreements in expert-annotated named entity datasets for three languages: English, Danish, and Bavarian. We show that text ambiguity and artificial guideline changes are dominant factors for diverse annotations among high-quality revisions. We survey student annotations on a subset of difficult entities and substantiate the feasibility and necessity of manifold annotations for understanding named entity ambiguities from a distributional perspective.

MCML Authors

Siyao Peng

Dr.

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[863]

V. Ehm, P. Roetzer, M. Eisenberger, M. Gao, F. Bernard and D. Cremers.
Geometrically Consistent Partial Shape Matching.
3DV 2024 - 11th International Conference on 3D Vision. Davos, Switzerland, Mar 18-21, 2024. DOI GitHub

Abstract

Finding correspondences between 3D shapes is a crucial problem in computer vision and graphics, which is for example relevant for tasks like shape interpolation, pose transfer, or texture transfer. An often neglected but essential property of matchings is geometric consistency, which means that neighboring triangles in one shape are consistently matched to neighboring triangles in the other shape. Moreover, while in practice one often has only access to partial observations of a 3D shape (e.g. due to occlusion, or scanning artifacts), there do not exist any methods that directly address geometrically consistent partial shape matching. In this work we fill this gap by proposing to integrate state-of-the-art deep shape features into a novel integer linear programming partial shape matching formulation. Our optimization yields a globally optimal solution on low resolution shapes, which we then refine using a coarse-to-fine scheme. We show that our method can find more reliable results on partial shapes in comparison to existing geometrically consistent algorithms (for which one first has to fill missing parts with a dummy geometry). Moreover, our matchings are substantially smoother than learning-based state-of-the-art shape matching methods.

MCML Authors

Viktoria Ehm

Computer Vision & Artificial Intelligence

Maolin Gao

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[862]

A. Hayler, F. Wimbauer, D. Muhle, C. Rupprecht and D. Cremers.
S4C: Self-Supervised Semantic Scene Completion with Neural Fields.
3DV 2024 - 11th International Conference on 3D Vision. Davos, Switzerland, Mar 18-21, 2024. DOI

Abstract

3D semantic scene understanding is a fundamental challenge in computer vision. It enables mobile agents to autonomously plan and navigate arbitrary environments. SSC formalizes this challenge as jointly estimating dense geometry and semantic information from sparse observations of a scene. Current methods for SSC are generally trained on 3D ground truth based on aggregated LiDAR scans. This process relies on special sensors and annotation by hand which are costly and do not scale well. To overcome this issue, our work presents the first self-supervised approach to SSC called S4C that does not rely on 3D ground truth data. Our proposed method can reconstruct a scene from a single image and only relies on videos and pseudo segmentation ground truth generated from off-the-shelf image segmentation network during training. Unlike existing methods, which use discrete voxel grids, we represent scenes as implicit semantic fields. This formulation allows querying any point within the camera frustum for occupancy and semantic class. Our architecture is trained through rendering-based self-supervised losses. Nonetheless, our method achieves performance close to fully supervised state-of-the-art methods. Additionally, our method demonstrates strong generalization capabilities and can synthesize accurate segmentation maps for far away viewpoints.

MCML Authors

Felix Wimbauer

Computer Vision & Artificial Intelligence

Dominik Muhle

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[861]

S. Klenk, M. Motzet, L. Koestler and D. Cremers.
Deep Event Visual Odometry.
3DV 2024 - 11th International Conference on 3D Vision. Davos, Switzerland, Mar 18-21, 2024. DOI

Abstract

Event cameras offer the exciting possibility of tracking the camera’s pose during high-speed motion and in adverse lighting conditions. Despite this promise, existing event-based monocular visual odometry (VO) approaches demonstrate limited performance on recent benchmarks. To address this limitation, some methods resort to additional sensors such as IMUs, stereo event cameras, or frame-based cameras. Nonetheless, these additional sensors limit the application of event cameras in real-world devices since they increase cost and complicate system requirements. Moreover, relying on a frame-based camera makes the system susceptible to motion blur and HDR. To remove the dependency on additional sensors and to push the limits of using only a single event camera, we present Deep Event VO (DEVO), the first monocular event-only system with strong performance on a large number of real-world benchmarks. DEVO sparsely tracks selected event patches over time. A key component of DEVO is a novel deep patch selection mechanism tailored to event data. We significantly decrease the state-of-the-art pose tracking error on seven real-world benchmarks by up to 97% compared to event-only methods and often surpass or are close to stereo or inertial methods.

MCML Authors

Simon Klenk

* Former Member

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computer Vision & Artificial Intelligence

[860]

E. Artemova, V. Blaschke and B. Plank.
Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties.
EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

Mainstream cross-lingual task-oriented dialogue (ToD) systems leverage the transfer learning paradigm by training a joint model for intent recognition and slot-filling in English and applying it, zero-shot, to other languages.We address a gap in prior research, which often overlooked the transfer to lower-resource colloquial varieties due to limited test data.Inspired by prior work on English varieties, we craft and manually evaluate perturbation rules that transform German sentences into colloquial forms and use them to synthesize test sets in four ToD datasets.Our perturbation rules cover 18 distinct language phenomena, enabling us to explore the impact of each perturbation on slot and intent performance.Using these new datasets, we conduct an experimental evaluation across six different transformers.Here, we demonstrate that when applied to colloquial varieties, ToD systems maintain their intent recognition performance, losing 6% (4.62 percentage points) in accuracy on average. However, they exhibit a significant drop in slot detection, with a decrease of 31% (21 percentage points) in slot F1 score.Our findings are further supported by a transfer experiment from Standard American English to synthetic Urban African American Vernacular English.

MCML Authors

Verena Blaschke

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[859]

J. Baan, R. Fernández, B. Plank and W. Aziz.
Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?
EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

With the rise of increasingly powerful and user-facing NLP systems, there is growing interest in assessing whether they have a good representation of uncertainty by evaluating the quality of their predictive distribution over outcomes. We identify two main perspectives that drive starkly different evaluation protocols. The first treats predictive probability as an indication of model confidence; the second as an indication of human label variation. We discuss their merits and limitations, and take the position that both are crucial for trustworthy and fair NLP systems, but that exploiting a single predictive distribution is limiting. We recommend tools and highlight exciting directions towards models with disentangled representations of uncertainty about predictions and uncertainty about human labels.

MCML Authors

Barbara Plank

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

AI and Computational Linguistics

[858]

B. Ma, E. Nie, S. Yuan, H. Schmid, M. Färber, F. Kreuter and H. Schütze.
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks.
EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

Prompt-based methods have been successfully applied to multilingual pretrained language models for zero-shot cross-lingual understanding. However, most previous studies primarily focused on sentence-level classification tasks, and only a few considered token-level labeling tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose Token-Level Prompt Decomposition (ToPro), which facilitates the prompt-based method for token-level sequence labeling tasks. The ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token. Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer, especially for languages that are typologically different from the source language English. Our method also attains state-of-the-art performance when employed with the mT5 model. Besides, our exploratory study in multilingual large language models shows that ToPro performs much better than the current in-context learning method. Overall, the performance improvements show that ToPro could potentially serve as a novel and simple benchmarking method for sequence labeling tasks.

MCML Authors

Bolei Ma

Social Data Science and AI

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[857]

L. K. Senel, B. Ebing, K. Baghirova, H. Schütze and G. Glavaš.
Kardeş-NLU: Transfer to Low-Resource Languages with Big Brother’s Help – A Benchmark and Evaluation for Turkic Languages.
EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

Cross-lingual transfer (XLT) driven by massively multilingual language models (mmLMs) has been shown largely ineffective for low-resource (LR) target languages with little (or no) representation in mmLM’s pretraining, especially if they are linguistically distant from the high-resource (HR) source language. Much of the recent focus in XLT research has been dedicated to LR language families, i.e., families without any HR languages (e.g., families of African languages or indigenous languages of the Americas). In this work, in contrast, we investigate a configuration that is arguably of practical relevance for more of the world’s languages: XLT to LR languages that do have a close HR relative. To explore the extent to which a HR language can facilitate transfer to its LR relatives, we (1) introduce Kardeş-NLU, an evaluation benchmark with language understanding datasets in five LR Turkic languages: Azerbaijani, Kazakh, Kyrgyz, Uzbek, and Uyghur; and (2) investigate (a) intermediate training and (b) fine-tuning strategies that leverage Turkish in XLT to these target languages. Our experimental results show that both - integrating Turkish in intermediate training and in downstream fine-tuning - yield substantial improvements in XLT to LR Turkic languages. Finally, we benchmark cutting-edge instruction-tuned large language models on Kardeş-NLU, showing that their performance is highly task- and language-dependent.

MCML Authors

Lütfi Kerem Senel

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[856]

M. Zhang, R. van der Goot, M.-Y. Kan and B. Plank.
NNOSE: Nearest Neighbor Occupational Skill Extraction.
EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks—combining and leveraging multiple datasets for skill extraction, to identify rarely observed skills within a dataset, and overcoming the scarcity of skills across datasets. In particular, we investigate the retrieval-augmentation of language models, employing an external datastore for retrieving similar skills in a dataset-unifying manner. Our proposed method, Nearest Neighbor Occupational Skill Extraction (NNOSE) effectively leverages multiple datasets by retrieving neighboring skills from other datasets in the datastore. This improves skill extraction without additional fine-tuning. Crucially, we observe a performance gain in predicting infrequent patterns, with substantial gains of up to 30% span-F1 in cross-dataset settings.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[855]

P. Lin, C. Hu, Z. Zhang, A. Martins and H. Schütze.
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models.
EACL 2024 - Findings of the 18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

Recent multilingual pretrained language models (mPLMs) have been shown to encode strong language-specific signals, which are not explicitly provided during pretraining. It remains an open question whether it is feasible to employ mPLMs to measure language similarity, and subsequently use the similarity results to select source languages for boosting cross-lingual transfer. To investigate this, we propose mPLM-Sim, a language similarity measure that induces the similarities across languages from mPLMs using multi-parallel corpora. Our study shows that mPLM-Sim exhibits moderately high correlations with linguistic similarity measures, such as lexicostatistics, genealogical language family, and geographical sprachbund. We also conduct a case study on languages with low correlation and observe that mPLM-Sim yields more accurate similarity results. Additionally, we find that similarity results vary across different mPLMs and different layers within an mPLM. We further investigate whether mPLM-Sim is effective for zero-shot cross-lingual transfer by conducting experiments on both low-level syntactic tasks and high-level semantic tasks. The experimental results demonstrate that mPLM-Sim is capable of selecting better source languages than linguistic measures, resulting in a 1%-2% improvement in zero-shot cross-lingual transfer performance.

MCML Authors

Peiqin Lin

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[854]

M. Zhang, R. van der Goot and B. Plank.
Entity Linking in the Job Market Domain.
EACL 2024 - Findings of the 18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

In Natural Language Processing, entity linking (EL) has centered around Wikipedia, but yet remains underexplored for the job market domain. Disambiguating skill mentions can help us get insight into the current labor market demands. In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014). Previous efforts linked coarse-grained (full) sentences to a corresponding ESCO skill. In this work, we link more fine-grained span-level mentions of skills. We tune two high-performing neural EL models, a bi-encoder (Wu et al., 2020) and an autoregressive model (Cao et al., 2021), on a synthetically generated mention–skill pair dataset and evaluate them on a human-annotated skill-linking benchmark. Our findings reveal that both models are capable of linking implicit mentions of skills to their correct taxonomy counterparts. Empirically, BLINK outperforms GENRE in strict evaluation, but GENRE performs better in loose evaluation (accuracy@k).

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[853]

A. Sorensen, S. Peng, B. Plank and R. Goot.
EEVEE: An Easy Annotation Tool for Natural Language Processing.
LAW @EACL 2024 - 18th Linguistic Annotation Workshop at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

Annotation tools are the starting point for creating Natural Language Processing (NLP) datasets. There is a wide variety of tools available; setting up these tools is however a hindrance. We propose EEVEE, an annotation tool focused on simplicity, efficiency, and ease of use. It can run directly in the browser (no setup required) and uses tab-separated files (as opposed to character offsets or task-specific formats) for annotation. It allows for annotation of multiple tasks on a single dataset and supports four task-types: sequence labeling, span labeling, text classification and seq2seq.

MCML Authors

Siyao Peng

Dr.

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

[852]

L. Weber-Genzel, R. Litschko, E. Artemova and B. Plank.
Donkii: Characterizing and Detecting Errors in Instruction-Tuning Datasets.
LAW @EACL 2024 - 18th Linguistic Annotation Workshop at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

Instruction tuning has become an integral part of training pipelines for Large Language Models (LLMs) and has been shown to yield strong performance gains. In an orthogonal line of research, Annotation Error Detection (AED) has emerged as a tool for detecting quality problems in gold standard labels. So far, however, the application of AED methods has been limited to classification tasks. It is an open question how well AED methods generalize to language generation settings, which are becoming more widespread via LLMs. In this paper, we present a first and novel benchmark for AED on instruction tuning data: Donkii.It comprises three instruction-tuning datasets enriched with error annotations by experts and semi-automatic methods. We also provide a novel taxonomy of error types for instruction-tuning data.We find that all three datasets contain clear errors, which sometimes propagate directly into instruction-tuned LLMs. We propose four AED baselines for the generative setting and evaluate them extensively on the newly introduced dataset. Our results show that the choice of the right AED method and model size is indeed crucial and derive practical recommendations for how to use AED methods to clean instruction-tuning data.

MCML Authors

Leon Weber-Genzel

Dr.

* Former Member

Robert Litschko

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

AI and Computational Linguistics

[851]

J. Beck, S. Eckman, B. Ma, R. Chew and F. Kreuter.
Order Effects in Annotation Tasks: Further Evidence of Annotation Sensitivity.
UncertaiNLP @EACL 2024 - 1st Workshop on Uncertainty-Aware NLP at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

The data-centric revolution in AI has revealed the importance of high-quality training data for developing successful AI models. However, annotations are sensitive to annotator characteristics, training materials, and to the design and wording of the data collection instrument. This paper explores the impact of observation order on annotations. We find that annotators’ judgments change based on the order in which they see observations. We use ideas from social psychology to motivate hypotheses about why this order effect occurs. We believe that insights from social science can help AI researchers improve data and model quality.

MCML Authors

Jacob Beck

Social Data Science and AI

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[850]

M. Zaiss, J. R. Rajput, H. N. Dang, V. Golkov, D. Cremers, F. Knoll and A. Maier.
GPT4MR: Exploring GPT-4 as an MR Sequence and Reconstruction Programming Assistant.
BVM 2024 - German Conference on Medical Image Computing -Bildverarbeitung für die Medizin. Erlangen, Germany, Mar 10-02, 2024. DOI

Abstract

In this study, we explore the potential of generative pre-trained transformer (GPT), as a coding assistant for MRI sequence programming using the Pulseq framework. The programming of MRI sequences is traditionally a complex and time-consuming task, and the Pulseq standard has recently simplified this process. It allows researchers to define and generate complex pulse sequences used in MRI experiments. Leveraging GPT-4’s capabilities in natural language generation, we adapted it for MRI sequence programming, creating a specialized assistant named GPT4MR. Our tests involved generating various MRI sequences, revealing that GPT-4, guided by a tailored prompt, outperformed GPT-3.5, producing fewer errors and demonstrating improved reasoning. Despite limitations in handling complex sequences, GPT4MR corrected its own errors and successfully generated code with step-by-step instructions. The study showcases GPT4MR’s ability to accelerate MRI sequence development, even for novel ideas absent in its training set. While further research and improvement are needed to address complexity limitations, a well-designed prompt enhances performance. The findings propose GPT4MR as a valuable MRI sequence programming assistant, streamlining prototyping and development. The future prospect involves integrating a PyPulseq plugin into lightweight, open-source LLMs, potentially revolutionizing MRI sequence development and prototyping.

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[849]

P. Wesp.
Application of machine learning in CT colonography and radiological age assessment: enhancing traditional diagnostics in radiology.
Dissertation 2024. DOI

Abstract

Machine learning can address limitations in radiology where traditional methods fall short, as shown by this work’s focus on two clinical problems: differentiating premalignant from benign colorectal polyps and continuous age prediction through clavicle ossification in CT scans. For colorectal polyps, a random forest classifier and CNN models enabled non-invasive differentiation between benign and premalignant types in CT colonography, potentially supporting more precise cancer prevention. For age assessment, a deep learning model trained on automatically detected clavicle regions achieved superior accuracy compared to human estimates, demonstrating machine learning’s potential to enhance radiological diagnostics in complex cases. (Shortened).

MCML Authors

Philipp Wesp

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

[848]

S. Dandl, C. Haslinger, T. Hothorn, H. Seibold, E. Sverdrup, S. Wager and A. Zeileis.
What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work?
Annals of Applied Statistics 18.1 (Mar. 2024). DOI

Abstract

Estimation of heterogeneous treatment effects (HTE) is of prime importance in many disciplines, from personalized medicine to economics among many others. Random forests have been shown to be a flexible and powerful approach to HTE estimation in both randomized trials and observational studies. In particular “causal forests” introduced by Athey, Tibshirani and Wager (Ann. Statist. 47 (2019) 1148–1178), along with the R implementation in package grf were rapidly adopted. A related approach, called ‘model-based forests’ that is geared toward randomized trials and simultaneously captures effects of both prognostic and predictive variables, was introduced by Seibold, Zeileis and Hothorn (Stat. Methods Med. Res. 27 (2018) 3104–3125) along with a modular implementation in the R package model4you. Neither procedure is directly applicable to the estimation of individualized predictions of excess postpartum blood loss caused by a cesarean section in comparison to vaginal delivery. Clearly, randomization is hardly possible in this setup, and thus model-based forests lack clinical trial data to address this question. On the other hand, the skewed and interval-censored postpartum blood loss observations violate assumptions made by causal forests. Here we present a tailored model-based forest for skewed and interval-censored data to infer possible predictive prepartum characteristics and their impact on excess postpartum blood loss caused by a cesarean section. As a methodological basis, we propose a unifying view on causal and model-based forests that goes beyond the theoretical motivations and investigates which computational elements make causal forests so successful and how these can be blended with the strengths of model-based forests. To do so, we show that both methods can be understood in terms of the same parameters and model assumptions for an additive model under L2 loss. This theoretical insight allows us to implement several flavors of ‘model-based causal forests’ and dissect their different elements in silico. The original causal forests and model-based forests are compared with the new blended versions in a benchmark study exploring both randomized trials and observational settings. In the randomized setting, both approaches performed akin. If confounding was present in the data-generating process, we found local centering of the treatment indicator with the corresponding propensities to be the main driver for good performance. Local centering of the outcome was less important and might be replaced or enhanced by simultaneous split selection with respect to both prognostic and predictive effects. This lays the foundation for future research combining random forests for HTE estimation with other types of models.

MCML Authors

Susanne Dandl

Dr.

* Former Member

[847]

F. Coens, N. Knops, I. Tieken, S. Vogelaar, A. Bender, J. J. Kim, K. Krupka, L. Pape, A. Raes, B. Tönshoff, A. Prytula and C. Registry.
Time-Varying Determinants of Graft Failure in Pediatric Kidney Transplantation in Europe.
Clinical Journal of the American Society of Nephrology 19.3 (Mar. 2024). DOI

Abstract

Little is known about the time-varying determinants of kidney graft failure in children. We performed a retrospective study of primary pediatric kidney transplant recipients (younger than 18 years) from the Eurotransplant registry (1990-2020). Piece-wise exponential additive mixed models were applied to analyze time-varying recipient, donor, and transplant risk factors. Primary outcome was death-censored graft failure.

MCML Authors

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[846]

S. T. Arasteh, A. Ziller, C. Kuhl, M. Makowski, S. Nebelung, R. Braren, D. Rückert, D. Truhn and G. Kaissis.
Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging.
Communications Medicine 4.46 (Mar. 2024). DOI

Abstract

Background: Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training.
Methods: We used two datasets: (1) A large dataset (N=193,311) of high quality clinical chest radiographs, and (2) a dataset (N=1625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver operating characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson’s r or Statistical Parity Difference.
Results: We find that, while the privacy-preserving training yields lower accuracy, it largely does not amplify discrimination against age, sex or co-morbidity. However, we find an indication that difficult diagnoses and subgroups suffer stronger performance hits in private training.
Conclusions: Our study shows that – under the challenging realistic circumstances of a real-life clinical dataset – the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Georgios Kaissis

Dr.

* Former Member

[845]

W. H. Hartl, P. Kopper, L. Xu, L. Heller, M. Mironov, R. Wang, A. G. Day, G. Elke, H. Küchenhoff and A. Bender.
Relevance of Protein Intake for Weaning in the Mechanically Ventilated Critically Ill: Analysis of a Large International Database.
Critical Care Medicine 50.3 (Mar. 2024). DOI

Abstract

The association between protein intake and the need for mechanical ventilation (MV) is controversial. We aimed to investigate the associations between protein intake and outcomes in ventilated critically ill patients.

MCML Authors

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Consulting Unit (StaBLab)

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[844]

M. Maritsch, S. Föll, V. Lehmann, N. Styger, C. Bérubé, M. Kraus, S. Feuerriegel, T. Kowatsch, T. Züger, E. Fleisch, F. Wortmann and C. Stettler.
Smartwatches for non-invasive hypoglycaemia detection during cognitive and psychomotor stress.
Diabetes, Obesity and Metabolism 26.3 (Mar. 2024). DOI

Abstract

Hypoglycaemia is one of the most relevant complications of diabetes1 and induces alterations in physiological parameters2, 3 that can be measured with smartwatches and detected using machine learning (ML).4 The performance of these algorithms when applied to different hypoglycaemic ranges or in situations involving cognitive and psychomotor stress remains unclear. Demanding tasks can significantly affect the physiological responses on which the wearable-based hypoglycaemia detection relies.5 The present analysis aimed to investigate ML-based hypoglycaemia detection using wearable data at different levels of hypoglycaemia during a complex task involving cognitive and psychomotor challenges (driving).

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[843]

P. J. Schüffler, K. Steiger and C. Mogler.
Künstliche Intelligenz in der Pathologie – wie, wo und warum?
Die Pathologie (Mar. 2024). DOI

Abstract

Künstliche Intelligenz verspricht viele Erneuerungen und Erleichterungen in der Pathologie, wirft jedoch ebenso viele Fragen und Ungewissheiten auf. In diesem Artikel geben wir eine kurze Übersicht über den aktuellen Stand, die bereits erreichten Ziele vorhandener Algorithmen und immer noch ausstehende Herausforderungen.

MCML Authors

Peter Schüffler

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Computational Pathology

[842]

Q. Li, L. Mou, Y. Sun, Y. Hua, Y. Shi and X. Zhu.
A Review of Building Extraction From Remote Sensing Imagery: Geometrical Structures and Semantic Attributes.
IEEE Transactions on Geoscience and Remote Sensing 62 (Mar. 2024). DOI

Abstract

In the remote sensing community, extracting buildings from remote sensing imagery has triggered great interest. While many studies have been conducted, a comprehensive review of these approaches that are applied to optical and synthetic aperture radar (SAR) imagery is still lacking. Therefore, we provide an in-depth review of both early efforts and recent advances, which are aimed at extracting geometrical structures or semantic attributes of buildings, including building footprint generation, building facade segmentation, roof segment and superstructure segmentation, building height retrieval, building-type classification, building change detection, and annotation data correction. Furthermore, a list of corresponding benchmark datasets is given. Finally, challenges and outlooks of existing approaches as well as promising applications are discussed to enhance comprehension within this realm of research.

MCML Authors

Yao Sun

Dr.

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[841]

Z. Yuan, L. Mou, Y. Hua and X. Zhu.
RRSIS: Referring Remote Sensing Image Segmentation.
IEEE Transactions on Geoscience and Remote Sensing 62 (Mar. 2024). DOI GitHub

Abstract

Localizing desired objects from remote sensing images is of great use in practical applications. Referring image segmentation, which aims at segmenting out the objects to which a given expression refers, has been extensively studied in natural images. However, almost no research attention is given to this task of remote sensing imagery. Considering its potential for real-world applications, in this article, we introduce referring remote sensing image segmentation (RRSIS) to fill in this gap and make some insightful explorations. Specifically, we created a new dataset, called RefSegRS, for this task, enabling us to evaluate different methods. Afterward, we benchmark referring image segmentation methods of natural images on the RefSegRS dataset and find that these models show limited efficacy in detecting small and scattered objects. To alleviate this issue, we propose a language-guided cross-scale enhancement (LGCE) module that utilizes linguistic features to adaptively enhance multiscale visual features by integrating both deep and shallow features. The proposed dataset, benchmarking results, and the designed LGCE module provide insights into the design of a better RRSIS model.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Data Science in Earth Observation

[840]

B. X. Liew, F. Pfisterer, D. Rügamer and X. Zhai.
Strategies to optimise machine learning classification performance when using biomechanical features.
Journal of Biomechanics 165 (Mar. 2024). DOI

Abstract

Building prediction models using biomechanical features is challenging because such models may require large sample sizes. However, collecting biomechanical data on large sample sizes is logistically very challenging. This study aims to investigate if modern machine learning algorithms can help overcome the issue of limited sample sizes on developing prediction models. This was a secondary data analysis two biomechanical datasets – a walking dataset on 2295 participants, and a countermovement jump dataset on 31 participants. The input features were the three-dimensional ground reaction forces (GRFs) of the lower limbs. The outcome was the orthopaedic disease category (healthy, calcaneus, ankle, knee, hip) in the walking dataset, and healthy vs people with patellofemoral pain syndrome in the jump dataset. Different algorithms were compared: multinomial/LASSO regression, XGBoost, various deep learning time-series algorithms with augmented data, and with transfer learning. For the outcome of weighted multiclass area under the receiver operating curve (AUC) in the walking dataset, the three models with the best performance were InceptionTime with x12 augmented data (0.810), XGBoost (0.804), and multinomial logistic regression (0.800). For the jump dataset, the top three models with the highest AUC were the LASSO (1.00), InceptionTime with x8 augmentation (0.750), and transfer learning (0.653). Machine-learning based strategies for managing the challenging issue of limited sample size for biomechanical ML-based problems, could benefit the development of alternative prediction models in healthcare, especially when time-series data are involved.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Statistics, Data Science and Machine Learning

[839]

N. Sturma, M. Drton and D. Leung.
Testing many constraints in possibly irregular models using incomplete U-statistics.
Journal of the Royal Statistical Society. Series B (Statistical Methodology) 86.4 (Mar. 2024). DOI

Abstract

We consider the problem of testing a null hypothesis defined by equality and inequality constraints on a statistical parameter. Testing such hypotheses can be challenging because the number of relevant constraints may be on the same order or even larger than the number of observed samples. Moreover, standard distributional approximations may be invalid due to irregularities in the null hypothesis. We propose a general testing methodology that aims to circumvent these difficulties. The constraints are estimated by incomplete U-statistics, and we derive critical values by Gaussian multiplier bootstrap. We show that the bootstrap approximation of incomplete U-statistics is valid for kernels that we call mixed degenerate when the number of combinations used to compute the incomplete U-statistic is of the same order as the sample size. It follows that our test controls type I error even in irregular settings. Furthermore, the bootstrap approximation covers high-dimensional settings making our testing strategy applicable for problems with many constraints. The methodology is applicable, in particular, when the constraints to be tested are polynomials in U-estimable parameters. As an application, we consider goodness-of-fit tests of latent-tree models for multivariate data.

MCML Authors

Nils Sturma

Mathematical Statistics

Mathias Drton

Prof. Dr.

Mathematical Statistics

[838]

J. Gertheiss, D. Rügamer and S. Greven.
Methoden für die Analyse funktionaler Daten.
Moderne Verfahren der Angewandten Statistik (Mar. 2024). DOI

Abstract

Funktionale Daten entstehen als diskrete Messungen von inhärent glatten Funktionen wie z. B. Bewegungsprofilen oder Infrarot-Absorptionsspektren. Dieses Kapitel behandelt anhand konkreter Beispiele einige grundlegende Analyseverfahren für derartige Daten. Dabei wird der Fokus auf Regressionsmodelle gelegt, bei denen zumindest einige der Einflussgrößen und/oder die Zielgröße funktional sind. Darüber hinaus wird in weitere Verfahren wie die funktionale Hauptkomponentenanalyse und die Clusteranalyse für funktionale Daten eingeführt.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[837]

S. Dandl, A. Bender and T. Hothorn.
Heterogeneous treatment effect estimation for observational data using model-based forests.
Statistical Methods in Medical Research 33.3 (Mar. 2024). DOI

Abstract

The estimation of heterogeneous treatment effects has attracted considerable interest in many disciplines, most prominently in medicine and economics. Contemporary research has so far primarily focused on continuous and binary responses where heterogeneous treatment effects are traditionally estimated by a linear model, which allows the estimation of constant or heterogeneous effects even under certain model misspecifications. More complex models for survival, count, or ordinal outcomes require stricter assumptions to reliably estimate the treatment effect. Most importantly, the noncollapsibility issue necessitates the joint estimation of treatment and prognostic effects. Model-based forests allow simultaneous estimation of covariate-dependent treatment and prognostic effects, but only for randomized trials. In this paper, we propose modifications to model-based forests to address the confounding issue in observational data. In particular, we evaluate an orthogonalization strategy originally proposed by Robinson (1988, Econometrica) in the context of model-based forests targeting heterogeneous treatment effect estimation in generalized linear models and transformation models. We found that this strategy reduces confounding effects in a simulated study with various outcome distributions. We demonstrate the practical aspects of heterogeneous treatment effect estimation for survival and ordinal outcomes by an assessment of the potentially heterogeneous effect of Riluzole on the progress of Amyotrophic Lateral Sclerosis.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[836]

R. Bailo, A. Barbaro, S. N. Gomes, K. Riedl, T. Roith, C. Totzeck and U. Vaes.
CBX: Python and Julia packages for consensus-based interacting particle methods.
Preprint (Mar. 2024). arXiv

Abstract

We introduce CBXPy and ConsensusBasedX.jl, Python and Julia implementations of consensus-based interacting particle systems (CBX), which generalise consensus-based optimization methods (CBO) for global, derivative-free optimisation. The raison d’ˆetre of our libraries is twofold: on the one hand, to offer high- performance implementations of CBX methods that the community can use directly, while on the other, providing a general interface that can accommodate and be extended to further variations of the CBX family. Python and Julia were selected as the leading high-level languages in terms of usage and performance, as well as for their popularity among the scientific computing community. Both libraries have been developed with a common ethos, ensuring a similar API and core functionality, while leveraging the strengths of each language and writing idiomatic code.

MCML Authors

Konstantin Riedl

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

* Former Member

[835]

P. Kopper, D. Rügamer, R. Sonabend, B. Bischl and A. Bender.
On Training Survival Models with Scoring Rules.
Preprint (Mar. 2024). arXiv

Abstract

Survival Analysis provides critical insights for partially incomplete time-to-event data in various domains. It is also an important example of probabilistic machine learning. The probabilistic nature of the predictions can be exploited by using (proper) scoring rules in the model fitting process instead of likelihood-based optimization. Our proposal does so in a generic manner and can be used for a variety of model classes. We establish different parametric and non-parametric sub-frameworks that allow different degrees of flexibility. Incorporated into neural networks, it leads to a computationally efficient and scalable optimization routine, yielding state-of-the-art predictive performance. Finally, we show that using our framework, we can recover various parametric models and demonstrate that optimization works equally well when compared to likelihood-based methods.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[834]

B. Lorenz, A. Bacho and G. Kutyniok.
Error Estimation for Physics-informed Neural Networks Approximating Semilinear Wave Equations.
Preprint (Mar. 2024). arXiv

Abstract

This paper provides rigorous error bounds for physics-informed neural networks approximating the semilinear wave equation. We provide bounds for the generalization and training error in terms of the width of the network’s layers and the number of training points for a tanh neural network with two hidden layers. Our main result is a bound of the total error in the H1([0,T];L2(Ω))-norm in terms of the training error and the number of training points, which can be made arbitrarily small under some assumptions. We illustrate our theoretical bounds with numerical experiments.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Foundations of Artificial Intelligence

[833]

A. Reuter, A. Thielmann, C. Weisser, S. Fischer and B. Säfken.
GPTopic: Dynamic and Interactive Topic Representations.
Preprint (Mar. 2024). arXiv GitHub

Abstract

Topic modeling seems to be almost synonymous with generating lists of top words to represent topics within large text corpora. However, deducing a topic from such list of individual terms can require substantial expertise and experience, making topic modelling less accessible to people unfamiliar with the particularities and pitfalls of top-word interpretation. A topic representation limited to top-words might further fall short of offering a comprehensive and easily accessible characterization of the various aspects, facets and nuances a topic might have. To address these challenges, we introduce GPTopic, a software package that leverages Large Language Models (LLMs) to create dynamic, interactive topic representations. GPTopic provides an intuitive chat interface for users to explore, analyze, and refine topics interactively, making topic modeling more accessible and comprehensive.

MCML Authors

Sebastian Fischer

Statistical Learning and Data Science

[832]

J. Rodemann, F. Croppi, P. Arens, Y. Sale, J. Herbinger, B. Bischl, E. Hüllermeier, T. Augustin, C. J. Walsh and G. Casalicchio.
Explaining Bayesian Optimization by Shapley Values Facilitates Human-AI Collaboration.
Preprint (Mar. 2024). arXiv

Abstract

Bayesian optimization (BO) with Gaussian processes (GP) has become an indispensable algorithm for black box optimization problems. Not without a dash of irony, BO is often considered a black box itself, lacking ways to provide reasons as to why certain parameters are proposed to be evaluated. This is particularly relevant in human-in-the-loop applications of BO, such as in robotics. We address this issue by proposing ShapleyBO, a framework for interpreting BO’s proposals by game-theoretic Shapley this http URL quantify each parameter’s contribution to BO’s acquisition function. Exploiting the linearity of Shapley values, we are further able to identify how strongly each parameter drives BO’s exploration and exploitation for additive acquisition functions like the confidence bound. We also show that ShapleyBO can disentangle the contributions to exploration into those that explore aleatoric and epistemic uncertainty. Moreover, our method gives rise to a ShapleyBO-assisted human machine interface (HMI), allowing users to interfere with BO in case proposals do not align with human reasoning. We demonstrate this HMI’s benefits for the use case of personalizing wearable robotic devices (assistive back exosuits) by human-in-the-loop BO. Results suggest human-BO teams with access to ShapleyBO can achieve lower regret than teams without.

MCML Authors

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[831]

M. Singh, A. Fono and G. Kutyniok.
Expressivity of Spiking Neural Networks.
Preprint (Mar. 2024). arXiv

Abstract

The synergy between spiking neural networks and neuromorphic hardware holds promise for the development of energy-efficient AI applications. Inspired by this potential, we revisit the foundational aspects to study the capabilities of spiking neural networks where information is encoded in the firing time of neurons. Under the Spike Response Model as a mathematical model of a spiking neuron with a linear response function, we compare the expressive power of artificial and spiking neural networks, where we initially show that they realize piecewise linear mappings. In contrast to ReLU networks, we prove that spiking neural networks can realize both continuous and discontinuous functions. Moreover, we provide complexity bounds on the size of spiking neural networks to emulate multi-layer (ReLU) neural networks. Restricting to the continuous setting, we also establish complexity bounds in the reverse direction for one-layer spiking neural networks.

MCML Authors

Adalbert Fono

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[830]

C. Wachinger, D. Hedderich and F. Bongratz.
Stochastic Cortical Self-Reconstruction.
Preprint (Mar. 2024). arXiv

Abstract

Magnetic resonance imaging (MRI) is critical for diagnosing neurodegenerative diseases, yet accurately assessing mild cortical atrophy remains a challenge due to its subtlety. Automated cortex reconstruction, paired with healthy reference ranges, aids in pinpointing pathological atrophy, yet their generalization is limited by biases from image acquisition and processing. We introduce the concept of stochastic cortical self-reconstruction (SCSR) that creates a subject-specific healthy reference by taking MRI-derived thicknesses as input and, therefore, implicitly accounting for potential confounders. SCSR randomly corrupts parts of the cortex and self-reconstructs them from the remaining information. Trained exclusively on healthy individuals, repeated self-reconstruction generates a stochastic reference cortex for assessing deviations from the norm. We present three implementations of this concept: XGBoost applied on parcels, and two autoencoders on vertex level – one based on a multilayer perceptron and the other using a spherical U-Net. These models were trained on healthy subjects from the UK Biobank and subsequently evaluated across four public Alzheimer’s datasets. Finally, we deploy the model on clinical in-house data, where deviation maps’ high spatial resolution aids in discriminating between four types of dementia.

MCML Authors

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Fabian Bongratz

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence in Medical Imaging

[829]

L. Weissweiler, A. Köksal and H. Schütze.
Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena.
Preprint (Mar. 2024). arXiv

Abstract

Argument Structure Constructions (ASCs) are one of the most well-studied construction groups, providing a unique opportunity to demonstrate the usefulness of Construction Grammar (CxG). For example, the caused-motion construction (CMC, She sneezed the foam off her cappuccino'') demonstrates that constructions must carry meaning, otherwise the fact that sneeze’’ in this context causes movement cannot be explained. We form the hypothesis that this remains challenging even for state-of-the-art Large Language Models (LLMs), for which we devise a test based on substituting the verb with a prototypical motion verb. To be able to perform this test at statistically significant scale, in the absence of adequate CxG corpora, we develop a novel pipeline of NLP-assisted collection of linguistically annotated text. We show how dependency parsing and GPT-3.5 can be used to significantly reduce annotation cost and thus enable the annotation of rare phenomena at scale. We then evaluate GPT, Gemini, Llama2 and Mistral models for their understanding of the CMC using the newly collected corpus. We find that all models struggle with understanding the motion component that the CMC adds to a sentence.

MCML Authors

Leonie Weissweiler

Dr.

* Former Member

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[828]

Z. Xiong, Y. Wang, F. Zhang, A. J. Stewart, J. Hanna, D. Borth, I. Papoutsis, B. Le Saux, G. Camps-Valls and X. Zhu.
Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation.
Preprint (Mar. 2024). arXiv

Abstract

The development of foundation models has revolutionized our ability to interpret the Earth’s surface using satellite observational data. Traditional models have been siloed, tailored to specific sensors or data types like optical, radar, and hyperspectral, each with its own unique characteristics. This specialization hinders the potential for a holistic analysis that could benefit from the combined strengths of these diverse data sources. Our novel approach introduces the Dynamic One-For-All (DOFA) model, leveraging the concept of neural plasticity in brain science to integrate various data modalities into a single framework adaptively. This dynamic hypernetwork, adjusting to different wavelengths, enables a single versatile Transformer jointly trained on data from five sensors to excel across 12 distinct Earth observation tasks, including sensors never seen during pretraining. DOFA’s innovative design offers a promising leap towards more accurate, efficient, and unified Earth observation analysis, showcasing remarkable adaptability and performance in harnessing the potential of multimodal Earth observation data.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[827]

H. Chen, Y. Zhang, D. Krompass, J. Gu and V. Tresp.
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount of data for finetuning. However, collecting and centralizing training data from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as a promising solution, enabling multiple clients to collaboratively train neural networks without centralizing their local data. To alleviate client computation burdens and communication overheads, previous works have adapted Parameter-efficient Finetuning (PEFT) methods for FL. Hereby, only a small fraction of the model parameters are optimized and communicated during federated communications. Nevertheless, most previous works have focused on a single modality and neglected one common phenomenon, i.e., the presence of data heterogeneity across the clients. Therefore, in this work, we propose a finetuning framework tailored to heterogeneous multi-modal FL, called Federated Dual-Aadapter Teacher (FedDAT). Specifically, our approach leverages a Dual-Adapter Teacher (DAT) to address data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer. FedDAT is the first approach that enables an efficient distributed finetuning of foundation models for a variety of heterogeneous Vision-Language tasks. To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity, where FedDAT substantially outperforms the existing centralized PEFT methods adapted for FL.

MCML Authors

Haokun Chen

Database Systems and Data Mining

Yao Zhang

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[826]

P. Kolpaczki, V. Bengs, M. Muschalik and E. Hüllermeier.
Approximating the Shapley Value without Marginal Contributions.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

The Shapley value, which is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, has recently been used intensively in explainable artificial intelligence. Its meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley value, most of them revolve around the notion of an agent’s marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contribution. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[825]

J. Lienen and E. Hüllermeier.
Mitigating Label Noise through Data Ambiguation.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

Label noise poses an important challenge in machine learning, especially in deep learning, in which large models with high expressive power dominate the field. Models of that kind are prone to memorizing incorrect labels, thereby harming generalization performance. Many methods have been proposed to address this problem, including robust loss functions and more complex label correction approaches. Robust loss functions are appealing due to their simplicity, but typically lack flexibility, while label correction usually adds substantial complexity to the training setup. In this paper, we suggest to address the shortcomings of both methodologies by ‘ambiguating’ the target information, adding additional, complementary candidate labels in case the learner is not sufficiently convinced of the observed training label. More precisely, we leverage the framework of so-called superset learning to construct set-valued targets based on a confidence threshold, which deliver imprecise yet more reliable beliefs about the ground-truth, effectively helping the learner to suppress the memorization effect. In an extensive empirical evaluation, our method demonstrates favorable learning behavior on synthetic and real-world noise, confirming the effectiveness in detecting and correcting erroneous training labels.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[824]

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

While shallow decision trees may be interpretable, larger ensemble models like gradient-boosted trees, which often set the state of the art in machine learning problems involving tabular data, still remain black box models. As a remedy, the Shapley value (SV) is a well-known concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions. The model-specific TreeSHAP methodology solves the exponential complexity for retrieving exact SVs from tree-based models. Expanding beyond individual feature attribution, Shapley interactions reveal the impact of intricate feature interactions of any order. In this work, we present TreeSHAP-IQ, an efficient method to compute any-order additive Shapley interactions for predictions of tree-based models. TreeSHAP-IQ is supported by a mathematical framework that exploits polynomial arithmetic to compute the interaction scores in a single recursive traversal of the tree, akin to Linear TreeSHAP. We apply TreeSHAP-IQ on state-of-the-art tree ensembles and explore interactions on well-established benchmark datasets.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[823]

T. N. Wolf, F. Bongratz, A.-M. Rickmann, S. Pölsterl and C. Wachinger.
Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

Explaining predictions of black-box neural networks is crucial when applied to decision-critical tasks. Thus, attribution maps are commonly used to identify important image regions, despite prior work showing that humans prefer explanations based on similar examples. To this end, ProtoPNet learns a set of class-representative feature vectors (prototypes) for case-based reasoning. During inference, similarities of latent features to prototypes are linearly classified to form predictions and attribution maps are provided to explain the similarity. In this work, we evaluate whether architectures for case-based reasoning fulfill established axioms required for faithful explanations using the example of ProtoPNet. We show that such architectures allow the extraction of faithful explanations. However, we prove that the attribution maps used to explain the similarities violate the axioms. We propose a new procedure to extract explanations for trained ProtoPNets, named ProtoPFaith. Conceptually, these explanations are Shapley values, calculated on the similarity scores of each prototype. They allow to faithfully answer which prototypes are present in an unseen image and quantify each pixel’s contribution to that presence, thereby complying with all axioms. The theoretical violations of ProtoPNet manifest in our experiments on three datasets (CUB-200-2011, Stanford Dogs, RSNA) and five architectures (ConvNet, ResNet, ResNet50, WideResNet50, ResNeXt50). Our experiments show a qualitative difference between the explanations given by ProtoPNet and ProtoPFaith. Additionally, we quantify the explanations with the Area Over the Perturbation Curve, on which ProtoPFaith outperforms ProtoPNet on all experiments by a factor >10^3.

MCML Authors

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[822]

A. Portafaix, P. Reidler, B. Sabel, J. Dexl, K. Jeblick, A. Mittermeier, M. Ingrisch and T. Fevens.
Computer vision-based guidance tool for correct radiographic hand positioning.
SPIE 2024 - SPIE Medical Imaging: Image Perception, Observer Performance, and Technology Assessment. San Diego, CA, USA, Feb 18-22, 2024. DOI

Abstract

Hand x-rays are used for tasks such as detecting fractures and investigating joint pain. The choice of the x-ray view plays a crucial role in a medical expert’s ability to make an accurate diagnosis. This is particularly important for the hand, where the small and overlapping bones of the carpals can make diagnosis challenging, even with proper positioning. In this study, we develop a prototype that uses deep learning models, iterative methods and a depth sensor to estimate hand and x-ray machine parameters. These parameters are then used to generate feedback that helps ensure proper radiographic hand positioning. The method of this study consists of five steps: detector table parameter estimation, 2D hand joint landmark prediction, hand joint landmark depth estimation, radiographic positioning parameter extraction, and radiographic protocol constraint verification. Detector plane parameter estimation is achieved by fitting a plane to randomly queried depth points using RANSAC. Google’s MediaPipe HandPose model is used for 2D hand joint landmark prediction, and hand joint depth estimation is determined using the OAK-D Pro sensor. Finally, hand positioning parameters are extracted and evaluated for the selected radiographic viewing protocol. We focus on three commonly used hand positioning protocols: posterior-anterior, oblique, and lateral view. The prototype also has a user interface and a feedback system designed for practical use in the x-ray room. Two evaluations are undertaken to validate our prototype. First, with the help of a radiology technician, we rate the tool’s positioning feedback. Second, using a bespoke left-hand x-ray phantom and an x-ray machine, we generate images with and without the prototype guidance for a double-blind study where the images are rated by a radiologist.

MCML Authors

Jakob Dexl

Clinical Data Science in Radiology

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[821]

K. Geißler, A. Ambroladze, N. Papenberg, T. L. Koller, H. Amer, E. M. Fallenberg, S. A. Kurt, M. Ingrisch and H. K. Hahn.
Deformable current-prior registration of DCE breast MR images on multi-site data.
SPIE 2024 - SPIE Medical Imaging: Image Processing. San Diego, CA, USA, Feb 18-22, 2024. DOI

Abstract

Recent studies indicate that malignant breast lesions can be predicted from structural changes in prior exams of preventive breast MRI examinations. Due to non-rigid deformation between studies, spatial correspondences between structures in two consecutive studies are lost. Thus, deformable image registration can contribute to predicting individual cancer risks. This study evaluates a registration approach based on a novel breast mask segmentation and non-linear image registration based on data from 5 different sites. The landmark error (mean ± standard deviation [1st quartile, 3rd quartile]), annotated by three radiologists, is 2.9 ± 2.8 [1.3, 3.2] mm when leaving out two outlier cases from the evaluation for which the registration failed completely. We assess the inter-observer variabilities of keypoint errors and find an error of 3.6 ± 4.7 [1.6, 4.0] mm, 4.4 ± 4.9 [1.8, 4.8] mm, and 3.8 ± 4.0 [1.7, 4.1] mm when comparing each radiologist to the mean keypoints of the other two radiologists. Our study shows that the current state of the art in registration is well suited to recover spatial correspondences of structures in cancerous and non-cancerous cases, despite the high level of difficulty of this task.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[820]

A. Reithmeir, J. A. Schnabel and V. A. Zimmer.
Learning physics-inspired regularization for medical image registration with hypernetworks.
SPIE 2024 - SPIE Medical Imaging: Image Processing. San Diego, CA, USA, Feb 18-22, 2024. DOI GitHub

Abstract

Medical image registration aims to identify the spatial deformation between images of the same anatomical region and is fundamental to image-based diagnostics and therapy. To date, the majority of the deep learning-based registration methods employ regularizers that enforce global spatial smoothness, e.g., the diffusion regularizer. However, such regularizers are not tailored to the data and might not be capable of reflecting the complex underlying deformation. In contrast, physics-inspired regularizers promote physically plausible deformations. One such regularizer is the linear elastic regularizer, which models the deformation of elastic material. These regularizers are driven by parameters that define the material’s physical properties. For biological tissue, a wide range of estimations of such parameters can be found in the literature, and it remains an open challenge to identify suitable parameter values for successful registration. To overcome this problem and to incorporate physical properties into learning-based registration, we propose to use a hypernetwork that learns the effect of the physical parameters of a physics-inspired regularizer on the resulting spatial deformation field. In particular, we adapt the HyperMorph framework to learn the effect of the two elasticity parameters of the linear elastic regularizer. Our approach enables the efficient discovery of suitable, data-specific physical parameters at test time. To the best of our knowledge, we are the first to use a hypernetwork to learn physics-inspired regularization for medical image registration. We evaluate our approach on 3D intrapatient lung CT images. The results show that the linear elastic regularizer can yield comparable results to the diffusion regularizer in unsupervised learning-based registration while predicting deformations with fewer foldings. With our method, the adaptation of the physical parameters to the data can successfully be performed at test time.

MCML Authors

Anna Reithmeir

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[819]

R. van Koningsbruggen, L. Haliburton, B. Rossmy, C. George, E. Hornecker and B. Hengeveld.
Metaphors and `Tacit' Data: the Role of Metaphors in Data and Physical Data Representations.
TEI 2024 - 18th International Conference on Tangible, Embedded, and Embodied Interaction. Cork, Ireland, Feb 11-14, 2024. DOI

Abstract

This paper explores (1) the role of metaphors in physical data representations and (2) the concept of tacit data: implicitly known data which are hard to uncover. In a semester course with twenty-three students, five teams explored how to represent self-chosen ‘tacit data’ in a visualisation, haptification, and dynamic physicalisation. Throughout these phases, our notion of tacit data evolved, resulting in a proposed working definition. Moreover, we noticed that metaphors played an increasingly important role. Based on analysis of students’ work and interviews with them, we found that tacit data and physical data representations need metaphors. For haptifications and physicalisations, metaphors help to circumvent limitations, curate data, and communicate to the audience. As tacit data were seen as ‘soft’ and difficult to quantify, metaphors made the data workable. Furthermore, tacit data benefit from physical representations, which offer further dimensions to represent the feeling and intimate aspects of data.

MCML Authors

Luke Haliburton

Dr.

* Former Member

[818]

A. Farshad.
Learning to Learn Neural Representations with Limited Data and Supervision.
Dissertation 2024. URL

Abstract

Learning to learn is a powerful paradigm that enables machine learning models to leverage the previously learned features for new tasks and domains more effectively. This thesis explores different aspects of learning to learn from data, models, and semantics, and shows how they can enhance various computer vision and medical imaging tasks. In the first part of the thesis, we present novel and fundamental research on learning to learn from data, and in the second part, we investigate the use of high-level semantics in generative models.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[817]

S. Wiegrebe, P. Kopper, R. Sonabend, B. Bischl and A. Bender.
Deep learning for survival analysis: a review.
Artificial Intelligence Review 57.65 (Feb. 2024). DOI

Abstract

The influx of deep learning (DL) techniques into the field of survival analysis in recent years has led to substantial methodological progress; for instance, learning from unstructured or high-dimensional data such as images, text or omics data. In this work, we conduct a comprehensive systematic review of DL-based methods for time-to-event analysis, characterizing them according to both survival- and DL-related attributes. In summary, the reviewed methods often address only a small subset of tasks relevant to time-to-event data—e.g., single-risk right-censored data—and neglect to incorporate more complex settings.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[816]

S. Feuerriegel, J. Hartmann, C. Janiesch and P. Zschech.
Generative AI.
Business and Information Systems Engineering 66.1 (Feb. 2024). DOI

Abstract

In this Catchword article, we provide a conceptualization of generative AI as an entity in socio-technical systems and provide examples of models, systems, and applications. Based on that, we introduce limitations of current generative AI and provide an agenda for BISE research. Previous papers discuss generative AI around specific methods such as language models (e.g., Teubner et al. 2023; Dwivedi et al. 2023; Schöbel et al. 2023) or specific applications such as marketing (e.g., Peres et al. 2023), innovation management (Burger et al. 2023), scholarly research (e.g., Susarla et al. 2023; Davison et al. 2023), and education (e.g., Kasneci et al. 2023; Gimpel et al. 2023). Different from these works, we focus on generative AI in the context of information systems, and, to this end, we discuss several opportunities and challenges that are unique to the BISE community and make suggestions for impactful directions for BISE research.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence in Management

[815]

C. A. Scholbeck, G. Casalicchio, C. Molnar, B. Bischl and C. Heumann.
Marginal Effects for Non-Linear Prediction Functions.
Data Mining and Knowledge Discovery 38 (Feb. 2024). DOI

Abstract

Beta coefficients for linear regression models represent the ideal form of an interpretable feature effect. However, for non-linear models and especially generalized linear models, the estimated coefficients cannot be interpreted as a direct feature effect on the predicted outcome. Hence, marginal effects are typically used as approximations for feature effects, either in the shape of derivatives of the prediction function or forward differences in prediction due to a change in a feature value. While marginal effects are commonly used in many scientific fields, they have not yet been adopted as a model-agnostic interpretation method for machine learning models. This may stem from their inflexibility as a univariate feature effect and their inability to deal with the non-linearities found in black box models. We introduce a new class of marginal effects termed forward marginal effects. We argue to abandon derivatives in favor of better-interpretable forward differences. Furthermore, we generalize marginal effects based on forward differences to multivariate changes in feature values. To account for the non-linearity of prediction functions, we introduce a non-linearity measure for marginal effects. We argue against summarizing feature effects of a non-linear prediction function in a single metric such as the average marginal effect. Instead, we propose to partition the feature space to compute conditional average marginal effects on feature subspaces, which serve as conditional feature effect estimates.

MCML Authors

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[814]

T. Li, K. Heidler, L. Mou, Á. Ignéczi, X. Zhu and J. L. Bamber.
A high-resolution calving front data product for marine-terminating glaciers in Svalbard.
Earth System Science Data 16.2 (Feb. 2024). DOI

Abstract

The mass loss of glaciers outside the polar ice sheets has been accelerating during the past several decades and has been contributing to global sea-level rise. However, many of the mechanisms of this mass loss process are not well understood, especially the calving dynamics of marine-terminating glaciers, in part due to a lack of high-resolution calving front observations. Svalbard is an ideal site to study the climate sensitivity of glaciers as it is a region that has been undergoing amplified climate variability in both space and time compared to the global mean. Here we present a new high-resolution calving front dataset of 149 marine-terminating glaciers in Svalbard, comprising 124 919 glacier calving front positions during the period 1985–2023 (https://doi.org/10.5281/zenodo.10407266, Li et al., 2023). This dataset was generated using a novel automated deep-learning framework and multiple optical and SAR satellite images from Landsat, Terra-ASTER, Sentinel-2, and Sentinel-1 satellite missions. The overall calving front mapping uncertainty across Svalbard is 31 m. The newly derived calving front dataset agrees well with recent decadal calving front observations between 2000 and 2020 (Kochtitzky and Copland, 2022) and an annual calving front dataset between 2008 and 2022 (Moholdt et al., 2022). The calving fronts between our product and the latter deviate by 32±65m on average. The R2 of the glacier calving front change rates between these two products is 0.98, indicating an excellent match. Using this new calving front dataset, we identified widespread calving front retreats during the past four decades, across most regions in Svalbard except for a handful of glaciers draining the ice caps Vestfonna and Austfonna on Nordaustlandet. In addition, we identified complex patterns of glacier surging events overlaid with seasonal calving cycles. These data and findings provide insights into understanding glacier calving mechanisms and drivers. This new dataset can help improve estimates of glacier frontal ablation as a component of the integrated mass balance of marine-terminating glaciers.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[813]

B. X. W. Liew, D. Rügamer and A. V. Birn-Jeffery.
Neuromechanical stabilisation of the centre of mass during running.
Gait and Posture 108 (Feb. 2024). DOI

Abstract

Background: Stabilisation of the centre of mass (COM) trajectory is thought to be important during running. There is emerging evidence of the importance of leg length and angle regulation during running, which could contribute to stability in the COM trajectory The present study aimed to understand if leg length and angle stabilises the vertical and anterior-posterior (AP) COM displacements, and if the stability alters with running speeds.
Methods: Data for this study came from an open-source treadmill running dataset (n = 28). Leg length (m) was calculated by taking the resultant distance of the two-dimensional sagittal plane leg vector (from pelvis segment to centre of pressure). Leg angle was defined by the angle subtended between the leg vector and the horizontal surface. Leg length and angle were scaled to a standard deviation of one. Uncontrolled manifold analysis (UCM) was used to provide an index of motor abundance (IMA) in the stabilisation of the vertical and AP COM displacement.
Results: IMAAP and IMAvertical were largely destabilising and always stabilising, respectively. As speed increased, the peak destabilising effect on IMAAP increased from −0.66(0.18) at 2.5 m/s to −1.12(0.18) at 4.5 m/s, and the peak stabilising effect on IMAvertical increased from 0.69 (0.19) at 2.5 m/s to 1.18 (0.18) at 4.5 m/s.
Conclusion: Two simple parameters from a simple spring-mass model, leg length and angle, can explain the control behind running. The variability in leg length and angle helped stabilise the vertical COM, whilst maintaining constant running speed may rely more on inter-limb variation to adjust the horizontal COM accelerations.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[812]

A. Mallol-Ragolta and B. W. Schuller.
Coupling Sentiment and Arousal Analysis Towards an Affective Dialogue Manager.
IEEE Access 12 (Feb. 2024). DOI

Abstract

We present the technologies and host components developed to power a speech-based dialogue manager with affective capabilities. The overall goal is that the system adapts its response to the sentiment and arousal level of the user inferred by analysing the linguistic and paralinguistic information embedded in his or her interaction. A linguistic-based, dedicated sentiment analysis component determines the body of the system response. A paralinguistic-based, dedicated arousal recognition component adjusts the energy level to convey in the affective system response. The sentiment analysis model is trained using the CMU-MOSEI dataset and implements a hierarchical contextual attention fusion network, which scores an Unweighted Average Recall (UAR) of 79.04% on the test set when tackling the task as a binary classification problem. The arousal recognition model is trained using the MSP-Podcast corpus. This model extracts the Mel-spectrogram representations of the speech signals, which are exploited with a Convolutional Neural Network (CNN) trained from scratch, and scores a UAR of 61.11% on the test set when tackling the task as a three-class classification problem. Furthermore, we highlight two sample dialogues implemented at the system back-end to detail how the sentiment and arousal inferences are coupled to determine the affective system response. These are also showcased in a proof of concept demonstrator. We publicly release the trained models to provide the research community with off-the-shelf sentiment analysis and arousal recognition tools.

MCML Authors

Adria Mallol-Ragolta

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[811]

Y. Xie, X. Yuan, X. Zhu and J. Tian.
Multimodal Co-Learning for Building Change Detection: A Domain Adaptation Framework Using VHR Images and Digital Surface Models.
IEEE Transactions on Geoscience and Remote Sensing 62 (Feb. 2024). DOI

Abstract

In this article, we propose a multimodal co-learning framework for building change detection. This framework can be adopted to jointly train a Siamese bitemporal image network and a height difference (HDiff) network with labeled source data and unlabeled target data pairs. Three co-learning combinations (vanilla co-learning, fusion co-learning, and detached fusion co-learning) are proposed and investigated with two types of co-learning loss functions within our framework. Our experimental results demonstrate that the proposed methods are able to take advantage of unlabeled target data pairs and, therefore, enhance the performance of single-modal neural networks on the target data. In addition, our synthetic-to-real experiments demonstrate that the recently published synthetic dataset, Simulated Multimodal Aerial Remote Sensing (SMARS), is feasible to be used in real change detection scenarios, where the optimal result is with the F1 score of 79.29%.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Data Science in Earth Observation

[810]

H. Weerts, F. Pfisterer, M. Feurer, K. Eggensperger, E. Bergman, N. Awad, J. Vanschoren, M. Pechenizkiy, B. Bischl and F. Hutter.
Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML.
Journal of Artificial Intelligence Research 79 (Feb. 2024). DOI

Abstract

The field of automated machine learning (AutoML) introduces techniques that automate parts of the development of machine learning (ML) systems, accelerating the process and reducing barriers for novices. However, decisions derived from ML models can reproduce, amplify, or even introduce unfairness in our societies, causing harm to (groups of) individuals. In response, researchers have started to propose AutoML systems that jointly optimize fairness and predictive performance to mitigate fairness-related harm. However, fairness is a complex and inherently interdisciplinary subject, and solely posing it as an optimization problem can have adverse side effects. With this work, we aim to raise awareness among developers of AutoML systems about such limitations of fairness-aware AutoML, while also calling attention to the potential of AutoML as a tool for fairness research. We present a comprehensive overview of different ways in which fairness-related harm can arise and the ensuing implications for the design of fairness-aware AutoML. We conclude that while fairness cannot be automated, fairness-aware AutoML can play an important role in the toolbox of ML practitioners. We highlight several open technical challenges for future work in this direction. Additionally, we advocate for the creation of more user-centered assistive systems designed to tackle challenges encountered in fairness work.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[809]

P. Gijsbers, M. L. P. Bueno, S. Coors, E. LeDell, S. Poirier, J. Thomas, B. Bischl and J. Vanschoren.
AMLB: an AutoML Benchmark.
Journal of Machine Learning Research 25.101 (Feb. 2024). URL

Abstract

Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explored with a multi-faceted analysis, evaluating model accuracy, its trade-offs with inference time, and framework failures. We also use Bradley-Terry trees to discover subsets of tasks where the relative AutoML framework rankings differ. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end: from framework installation and resource allocation to in-depth evaluation. The benchmark uses public data sets, can be easily extended with other AutoML frameworks and tasks, and has a website with up-to-date results.

MCML Authors

Stefan Coors

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[808]

D. Schalk, B. Bischl and D. Rügamer.
Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models.
Statistics and Computing 34.31 (Feb. 2024). DOI

Abstract

Various privacy-preserving frameworks that respect the individual’s privacy in the analysis of data have been developed in recent years. However, available model classes such as simple statistics or generalized linear models lack the flexibility required for a good approximation of the underlying data-generating process in practice. In this paper, we propose an algorithm for a distributed, privacy-preserving, and lossless estimation of generalized additive mixed models (GAMM) using component-wise gradient boosting (CWB). Making use of CWB allows us to reframe the GAMM estimation as a distributed fitting of base learners using the $L_2$-loss. In order to account for the heterogeneity of different data location sites, we propose a distributed version of a row-wise tensor product that allows the computation of site-specific (smooth) effects. Our adaption of CWB preserves all the important properties of the original algorithm, such as an unbiased feature selection and the feasibility to fit models in high-dimensional feature spaces, and yields equivalent model estimates as CWB on pooled data. Next to a derivation of the equivalence of both algorithms, we also showcase the efficacy of our algorithm on a distributed heart disease data set and compare it with state-of-the-art methods.

MCML Authors

Daniel Schalk

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Statistics, Data Science and Machine Learning

[807]

E. Nie, S. Yuan, B. Ma, H. Schmid, M. Färber, F. Kreuter and H. Schütze.
Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models.
Preprint (Feb. 2024). arXiv

Abstract

Despite the predominance of English in their training data, English-centric Large Language Models (LLMs) like GPT-3 and LLaMA display a remarkable ability to perform multilingual tasks, raising questions about the depth and nature of their cross-lingual capabilities. This paper introduces the decomposed prompting approach to probe the linguistic structure understanding of these LLMs in sequence labeling tasks. Diverging from the single text-to-text prompt, our method generates for each token of the input sentence an individual prompt which asks for its linguistic label. We assess our method on the Universal Dependencies part-of-speech tagging dataset for 38 languages, utilizing both English-centric and multilingual LLMs. Our findings show that decomposed prompting surpasses the iterative prompting baseline in efficacy and efficiency under zero- and few-shot settings. Further analysis reveals the influence of evaluation methods and the use of instructions in prompts. Our multilingual investigation shows that English-centric language models perform better on average than multilingual models. Our study offers insights into the multilingual transferability of English-centric LLMs, contributing to the understanding of their multilingual linguistic knowledge.

MCML Authors

Ercong Nie

Computational Linguistics

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[806]

A. Taghipour, M. Ghahremani, M. Bennamoun, A. M. Rekavandi, H. Laga and F. Boussaid.
Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models.
Preprint (Feb. 2024). arXiv GitHub

Abstract

While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module - a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: i) Object generation, which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and ii) attribute binding, guaranteeing that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models, markedly enhancing model performance in addressing the key challenges. We evaluate our technique using the established CompBench and TIFA score benchmarks, demonstrating significant performance improvements compared to existing methods.

MCML Authors

Morteza Ghahremani

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Artificial Intelligence in Medical Imaging

[805]

A. Chronopoulou.
Efficient multilingual and domain adaptation of language models under resource constraints.
Dissertation 2024. DOI

Abstract

This dissertation develops methods to improve natural language processing (NLP) systems for low-resource languages and diverse domains. For machine translation, it explores bilingual language models, static embeddings, and multilingual systems with adapters, achieving robust performance in low-resource settings. To enhance domain adaptation, it introduces hierarchical tree structures and efficient adapters, enabling better generalization and robustness to domain shifts. These approaches address data disparities and domain variability, advancing adaptable and efficient NLP systems. (Shortened).

MCML Authors

Alexandra Chronopoulou

Dr.

* Former Member

[804]

C. Geldhauser and H. Diebel-Fischer.
Is diverse and inclusive AI trapped in the gap between reality and algorithmizability?
NLDL 2024 - Northern Lights Deep Learning Conference. Tromsø, Norway, Jan 09-11, 2024. URL

Abstract

We investigate the preconditions of an operationalization of ethics on the example algorithmization, i.e. the mathematical implementation, of the concepts of fairness and diversity in AI. From a non-technical point of view in ethics, this implementation entails two major drawbacks, (1) as it narrows down big concepts to a single model that is deemed manageable, and (2) as it hides unsolved problems of humanity in a system that could be mistaken as the `solution’ to these problems. We encourage extra caution when dealing with such issues and vote for human oversight.

MCML Authors

Carina Geldhauser

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

* Former Member

[803]

M. Bernhard, R. Amoroso, Y. Kindermann, M. Schubert, L. Baraldi, R. Cucchiara and V. Tresp.
What's Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI GitHub

Abstract

Semantic segmentation represents a fundamental task in computer vision with various application areas such as autonomous driving, medical imaging, or remote sensing. For evaluating and comparing semantic segmentation models, the mean intersection over union (mIoU) is currently the gold standard. However, while mIoU serves as a valuable benchmark, it does not offer insights into the types of errors incurred by a model. Moreover, different types of errors may have different impacts on downstream applications. To address this issue, we propose an intuitive method for the systematic categorization of errors, thereby enabling a fine-grained analysis of semantic segmentation models. Since we assign each erroneous pixel to precisely one error type, our method seamlessly extends the popular IoU-based evaluation by shedding more light on the false positive and false negative predictions. Our approach is model- and dataset-agnostic, as it does not rely on additional information besides the predicted and ground-truth segmentation masks. In our experiments, we demonstrate that our method accurately assesses model strengths and weaknesses on a quantitative basis, thus reducing the dependence on time-consuming qualitative model inspection. We analyze a variety of state-of-the-art semantic segmentation models, revealing systematic differences across various architectural paradigms. Exploiting the gained insights, we showcase that combining two models with complementary strengths in a straightforward way is sufficient to consistently improve mIoU, even for models setting the current state of the art on ADE20K.

MCML Authors

Maximilian Bernhard

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[802]

M. Brahimi, B. Haefner, T. Yenamandra, B. Goldluecke and D. Cremers.
SupeRVol: Super-Resolution Shape and Reflectance Estimation in Inverse Volume Rendering.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI

Abstract

We propose an end-to-end inverse rendering pipeline called SupeRVol that allows us to recover 3D shape and material parameters from a set of color images in a superresolution manner. To this end, we represent both the bidirectional reflectance distribution function’s (BRDF) parameters and the signed distance function (SDF) by multi-layer perceptrons (MLPs). In order to obtain both the surface shape and its reflectance properties, we revert to a differentiable volume renderer with a physically based illumination model that allows us to decouple reflectance and lighting. This physical model takes into account the effect of the camera’s point spread function thereby enabling a reconstruction of shape and material in a super-resolution quality. Experimental validation confirms that SupeRVol achieves state of the art performance in terms of inverse rendering quality. It generates reconstructions that are sharper than the individual input images, making this method ideally suited for 3D modeling from low-resolution imagery.

MCML Authors

Tarun Yenamandra

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[801]

S. Klenk, D. Bonello, L. Koestler, N. Araslanov and D. Cremers.
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI

Abstract

Event cameras asynchronously capture brightness changes with low latency, high temporal resolution, and high dynamic range. However, annotation of event data is a costly and laborious process, which limits the use of deep learning methods for classification and other semantic tasks with the event modality. To reduce the dependency on labeled event data, we introduce Masked Event Modeling (MEM), a self-supervised framework for events. Our method pretrains a neural network on unlabeled events, which can originate from any event camera recording. Subsequently, the pretrained model is finetuned on a downstream task, leading to a consistent improvement of the task accuracy. For example, our method reaches state-of-the-art classification accuracy across three datasets, N-ImageNet, N-Cars, and N-Caltech101, increasing the top-1 accuracy of previous work by significant margins. When tested on real-world event data, MEM is even superior to supervised RGB-based pretraining. The models pretrained with MEM are also label-efficient and generalize well to the dense task of semantic image segmentation.

MCML Authors

Simon Klenk

* Former Member

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[800]

U. Sahin, H. Li, Q. Khan, D. Cremers and V. Tresp.
Enhancing Multimodal Compositional Reasoning of Visual Language Models With Generative Negative Mining.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI GitHub

Abstract

Contemporary large-scale visual language models (VLMs) exhibit strong representation capacities, making them ubiquitous for enhancing image and text understanding tasks. They are often trained in a contrastive manner on a large and diverse corpus of images and corresponding text captions scraped from the internet. Despite this, VLMs often struggle with compositional reasoning tasks which require a fine-grained understanding of the complex interactions of objects and their attributes. This failure can be attributed to two main factors: 1) Contrastive approaches have traditionally focused on mining negative examples from existing datasets. However, the mined negative examples might not be difficult for the model to discriminate from the positive. An alternative to mining would be negative sample generation 2) But existing generative approaches primarily focus on generating hard negative texts associated with a given image. Mining in the other direction, i.e., generating negative image samples associated with a given text has been ignored. To overcome both these limitations, we propose a framework that not only mines in both directions but also generates challenging negative samples in both modalities, i.e., images and texts. Leveraging these generative hard negative samples, we significantly enhance VLMs’ performance in tasks involving multimodal compositional reasoning.

MCML Authors

Hang Li

* Former Member

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[799]

T. Tewari, N. Yang, F. Bernard, C. Theobalt and D. Cremers.
FIRe: Fast Inverse Rendering Using Directional and Signed Distance Functions.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI

Abstract

Neural 3D implicit representations learn priors that are useful for diverse applications, such as single- or multiple-view 3D reconstruction. A major downside of existing approaches while rendering an image is that they require evaluating the network multiple times per camera ray so that the high computational time forms a bottleneck for downstream applications. We address this problem by introducing a novel neural scene representation that we call the directional distance function (DDF). To this end, we learn a signed distance function (SDF) along with our DDF model to represent a class of shapes. Specifically, our DDF is defined on the unit sphere and predicts the distance to the surface along any given direction. Therefore, our DDF allows rendering images with just a single network evaluation per camera ray. Based on our DDF, we present a novel fast algorithm (FIRe) to reconstruct 3D shapes given a posed depth map. We evaluate our proposed method on 3D reconstruction from single-view depth images, where we empirically show that our algorithm reconstructs 3D shapes more accurately and it is more than 15 times faster (per iteration) than competing methods.

MCML Authors

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[798]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Constrained Probabilistic Mask Learning for Task-specific Undersampled MRI Reconstruction.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI

Abstract

Undersampling is a common method in Magnetic Resonance Imaging (MRI) to subsample the number of data points in k-space, reducing acquisition times at the cost of decreased image quality. A popular approach is to employ undersampling patterns following various strategies, e.g., variable density sampling or radial trajectories. In this work, we propose a method that directly learns the under-sampling masks from data points, thereby also providing task- and domain-specific patterns. To solve the resulting discrete optimization problem, we propose a general optimization routine called ProM: A fully probabilistic, differentiable, versatile, and model-free framework for mask optimization that enforces acceleration factors through a convex constraint. Analyzing knee, brain, and cardiac MRI datasets with our method, we discover that different anatomic regions reveal distinct optimal undersampling masks, demonstrating the benefits of using custom masks, tailored for a downstream task. For example, ProM can create undersampling masks that maximize performance in downstream tasks like segmentation with networks trained on fully-sampled MRIs. Even with extreme acceleration factors, ProM yields reasonable performance while being more versatile than existing methods, paving the way for data-driven all-purpose mask generation.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[797]

G. Zhang, Y. Zhang, K. Zhang and V. Tresp.
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI GitHub

Abstract

Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even surpass human capability in reasoning times and location. To address this question, we propose a two-stage Recognition & Reasoning probing task applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the studies, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In extensive evaluation experiments, we find that although VLMs can effectively retain times and location-relevant features in visual encoders, they still fail to make perfect reasoning with context-conditioned visual features.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[796]

E. Hüllermeier and R. Slowinski.
Preference learning and multiple criteria decision aiding: Differences, commonalities, and synergies -- Part I.
4OR (Jan. 2024). DOI

Abstract

Multiple criteria decision aiding (MCDA) and preference learning (PL) are established research fields, which have different roots, developed in different communities – the former in the decision sciences and operations research, the latter in AI and machine learning – and have their own agendas in terms of problem setting, assumptions, and criteria of success. In spite of this, they share the major goal of constructing practically useful decision models that either support humans in the task of choosing the best, classifying, or ranking alternatives from a given set, or even automate decision-making by acting autonomously on behalf of the human. Therefore, MCDA and PL can complement and mutually benefit from each other, a potential that has been exhausted only to some extent so far. By elaborating on the connection between MCDA and PL in more depth, our goal is to stimulate further research at the junction of these two fields. To this end, we first review both methodologies, MCDA in this part of the paper and PL in the second part, with the intention of highlighting their most common elements. In the second part, we then compare both methodologies in a systematic way and give an overview of existing work on combining PL and MCDA.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[795]

E. Hüllermeier and R. Slowinski.
Preference learning and multiple criteria decision aiding: Differences, commonalities, and synergies -- Part II.
4OR (Jan. 2024). DOI

Abstract

This article elaborates on the connection between multiple criteria decision aiding (MCDA) and preference learning (PL), two research fields with different roots and developed in different communities. It complements the first part of the paper, in which we started with a review of MCDA. In this part, a similar review will be given for PL, followed by a systematic comparison of both methodologies, as well as an overview of existing work on combining PL and MCDA. Our main goal is to stimulate further research at the junction of these two methodologies.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[794]

B. Bischl, R. Sonabend, L. Kotthoff and M. Lang.
Applied Machine Learning Using mlr3 in R.
American Statistician 79.2 (Jan. 2024). DOI

Abstract

mlr3 is an award-winning ecosystem of R packages that have been developed to enable state-of-the-art machine learning capabilities in R. Applied Machine Learning Using mlr3 in R gives an overview of flexible and robust machine learning methods, with an emphasis on how to implement them using mlr3 in R. It covers various key topics, including basic machine learning tasks, such as building and evaluating a predictive model; hyperparameter tuning of machine learning approaches to obtain peak performance; building machine learning pipelines that perform complex operations such as pre-processing followed by modelling followed by aggregation of predictions; and extending the mlr3 ecosystem with custom learners, measures, or pipeline components. The book is primarily aimed at researchers, practitioners, and graduate students who use machine learning or who are interested in using it. It can be used as a textbook for an introductory or advanced machine learning class that uses R, as a reference for people who work with machine learning methods, and in industry for exploratory experiments in machine learning.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

* Former Member

[793]

G. Casalicchio and L. Burk.
Evaluation and Benchmarking.
Applied Machine Learning Using mlr3 in R I.3 (Jan. 2024). DOI

Abstract

Machine learning models can only be deployed in practice if they are robustly evaluated to estimate a model’s generalization performance, i.e. how well it will perform on new data. Resampling strategies including cross-validation and bootstrapping, can be used to estimate the generalization performance. Models can be compared to one another using a benchmark experiment, which makes use of the same resampling strategies and measures to fairly compare models and to help practitioners decide which model to use in practice.
This chapter introduces resample strategies in mlr3, including cross-validation, repeated cross-validation, leave-one-out, bootstrapping, and custom strategies. These are then demonstrated with the resample() function, which is used to resample a single learner with a given strategy. Benchmarking is then introduced and the benchmark() function is demonstrated for comparing multiple learners. The chapter concludes with a deep dive into binary classification evaluation, including ROC analysis and the Area Under the Curve metric.

MCML Authors

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Lukas Burk

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[792]

M. Becker, L. Schneider and S. Fischer.
Hyperparameter Optimization.
Applied Machine Learning Using mlr3 in R II.4 (Jan. 2024). DOI

Abstract

Machine learning models include parameters and hyperparameters. The former refers to model coefficients that are estimated during training. The latter are parameters that are set by the user and affect how the model is fit or how it makes predictions. Setting hyperparameters manually is arduous and error-prone, instead hyperparameter optimization (HPO) automating this ‘tuning’ procedure to reduce bias. When performing HPO there are many considerations including what tuning algorithm to use, how long to tune it for, and what measures to optimize. Moreover users have to decide which hyperparameters to tune and for what configurations. Finally, one has to be careful to make use of nested resampling to prevent leakage of information from training to testing datasets that can occur when resampling and tuning simultaneously. This chapter begins by introducing mlr3tuning and its functionality for tuning learners. This includes Tuners for configuring and running optimization algorithms, TuningInstances for storing results, and Terminators for controlling when to stop the HPO process. The chapter provides a practical example of tuning hyperparameters of a support vector machine, including introducing logarithmic transformations. The AutoTuner class is also introduced which is used for automating nested resampling to reduce bias in tuning.

MCML Authors

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[791]

L. Schneider and M. Becker.
Advanced Tuning Methods and Black Box Optimization.
Applied Machine Learning Using mlr3 in R II.5 (Jan. 2024). DOI

Abstract

Automated tuning can be error prone and it is very likely that models will crash in the tuning process, it is therefore essential to have reliable methods of encapsulating errors to prevent large experiments from failing and losing intermediate results. This chapter therefore begins by introducing fallback learners and encapsulation methods, which are returned to in ‘Advanced Technical Aspects of mlr3’.
Models can be tuned with respect to one or multiple measures. In general when tuning to multiple measures there will be a trade-off between them and therefore there will not be one optimal hyperparameter configuration, instead the aim is to estimate configurations that are not Pareto-dominated by any other. This chapter introduces multi-objective tuning and concepts including Pareto optimality.
Some tuning methods are more advanced than others, including Hyperband and Bayesian optimization. Hyperband is a multi-fidelity tuner that makes use of fidelity parameters, which provide a tradeoff between model runtime and performance accuracy. Bayesian optimization is a sample-efficient black-box optimization algorithm that is highly flexible and allows user fine-grained control over tuning large search spaces. This chapter introduces mlr3hyperband and the concept of fidelity parameters, and then mlr3mbo and bbotk to discuss black-box optimization and Bayesian optimization.

MCML Authors

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[790]

M. Binder and F. Pfisterer.
Sequential Pipelines.
Applied Machine Learning Using mlr3 in R II.7 (Jan. 2024). DOI

Abstract

Computational pipelines provide a layer of abstraction for swapping in and out different elements of the pipeline. In machine learning this can be useful for swapping algorithms, as well as common operations for data preprocessing and model post processing. Many real-world machine learning applications involve more than just fitting a single model at a time: It is often beneficial or even necessary to preprocess data for feature engineering and compatibility with learners. In many cases it is also useful to combine predictions of multiple models in ensembles. By defining these workflows as computational objects, it is then possible to treat them like models to be trained/tested and even tuned. This chapter introduces mlr3pipelines, a dataflow programming language that can be used to define machine learning processes from simple building blocks. The chapter focuses on sequential pipelines, in which data passes from one operation to another in a linear sequence and each operation has one input and output. The chapter introduces PipeOp and Graph, which are the building blocks of a pipeline, and provides some concrete examples with PCA.

MCML Authors

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[789]

M. Binder, F. Pfisterer, M. Becker and M. N. Wright.
Non-sequential Pipelines and Tuning.
Applied Machine Learning Using mlr3 in R II.8 (Jan. 2024). DOI

Abstract

Real-world applications often require complicated pipeline that do not progress sequentially. For example, many experiments have demonstrated that bagging is a powerful method to improve model performance. Bagging can be thought of as a non-sequential pipeline where a learner is replicated, each separate learner is trained and makes predictions, and their results are combined. This is non-sequential as data is not flowing sequentially through the pipeline but is instead passed to all learners (who may then subsample the data) and then recombined, thus creating a pipeline where operations have multiple inputs and outputs. Pipeline operations also have hyperparameters that can be set and tuned to improve model performance. Moreover the choice of operations to include in a pipeline can also be tuned, known as combined algorithm selection and hyperparameter optimization (CASH).
This chapter looks at more advanced uses of mlr3pipelines. This is put into practice by demonstrating how to build a bagging and stacking pipeline from scratch, as well as how to access common pipelines that are readily available in mlr3pipelines. The chapter then looks at tuning pipelines and CASH.

MCML Authors

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[788]

M. Lang, S. Fischer and R. Sonabend.
Advanced Technical Aspects of mlr3.
Applied Machine Learning Using mlr3 in R IV.10 (Jan. 2024). DOI

Abstract

Parallelization is often required to efficiently run machine learning models, which means models are run simultaneously on multiple CPU cores, CPUs, or computational nodes. This chapter begins by demonstrating how mlr3 uses the future package for parallelization and how different ‘plans’ can be applied to mlr3 experiments. In large machine learning experiments, it is common for a model to error during training or predicting. This is because the algorithms have to process arbitrary data, and not all eventualities can always be handled. It is therefore imperative to have robust methods for encapsulating and dealing with errors. This chapter builds on what has been briefly seen in Chapter 5 to discuss error handling and logging, including how to make use of fallback learners in experiments. Large experiments may also require data to be handled in different formats and to prevent all the data being loaded into memory. This chapter discussed different ‘backends’ that can be used for mlr3 Tasks, including interfacing with DuckDB and SQL. Finally, this chapter demonstrates how to extend classes in mlr3 by using the Measure class as an example. This may be of particular interest to readers who want to create new Measures or Learners.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[787]

S. Fischer, M. Lang and M. Becker.
Large-Scale Benchmarking.
Applied Machine Learning Using mlr3 in R IV.11 (Jan. 2024). DOI

Abstract

In the field of machine learning, benchmark experiments are used to evaluate and compare the performance of algorithms. To draw robust conclusions, benchmark experiments often have to be ‘large-scale’, which means including many datasets, learners, and possibly measures. Finding datasets can be difficult and the choice of dataset impacts conclusions that can be drawn. Conducting large-scale benchmark experiments is also complex as they are usually computationally intensive. It is therefore common to make use of high-performance computing clusters to efficiently run the experiment. Finally once these experiments are run, analysis of experiments usually requires more than a single score from a given performance measure, and therefore statistical test are often employed.
This chapter introduces mlr3oml for interfacing the OpenML database for accessing data and tasks. It then continues by discussing how to run experiments on high-performance computing clusters using batchtools and mlr3batchmark. Finally, mlr3benchmark is introduced for statistical analysis including Friedman tests and critical difference diagrams.

MCML Authors

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[786]

S. Dandl, P. Biecek, G. Casalicchio and M. N. Wright.
Model Interpretation.
Applied Machine Learning Using mlr3 in R IV.12 (Jan. 2024). DOI

Abstract

The increasing availability of data and software frameworks to create predictive models has allowed the widespread adoption of machine learning in many applications. However, high predictive performance of such models often comes at the cost of interpretability. Machine learning interpretation methods can be useful for several purposes: 1) gaining global insights into a model (e.g., feature importance); 2) model improvement if flaws were identified (e.g., unexpected reliance on a certain feature); 3) understanding individual predictions. Several model-agnostic methods have been developed including feature permutation, Shapleys, and LIME.
This chapter presents the packages iml, counterfactuals, and DALEX, which implement model-agnostic interpretation methods. Throughout the chapter an xgboost is trained on the german credit dataset to understand how predictions are made and why. The chapter starts with discussing the iml package and the theory behind the discussed methods, as well as how to practically use the interface. It then moves to counterfactuals and the benefits of counterfactual analysis, including methods What-If and MOC. Finally, DALEX is introduced, which includes similar methods to iml but with a different design, hence users can make use of either package depending on their design preference.

MCML Authors

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[785]

C. Nießl, S. Hoffmann, T. Ullmann and A.-L. Boulesteix.
Explaining the optimistic performance evaluation of newly proposed methods: A cross-design validation experiment.
Biometrical Journal 66.1 (Jan. 2024). DOI

Abstract

The constant development of new data analysis methods in many fields of research is accompanied by an increasing awareness that these new methods often perform better in their introductory paper than in subsequent comparison studies conducted by other researchers. We attempt to explain this discrepancy by conducting a systematic experiment that we call “cross-design validation of methods”. In the experiment, we select two methods designed for the same data analysis task, reproduce the results shown in each paper, and then reevaluate each method based on the study design (i.e., datasets, competing methods, and evaluation criteria) that was used to show the abilities of the other method. We conduct the experiment for two data analysis tasks, namely cancer subtyping using multiomic data and differential gene expression analysis. Three of the four methods included in the experiment indeed perform worse when they are evaluated on the new study design, which is mainly caused by the different datasets. Apart from illustrating the many degrees of freedom existing in the assessment of a method and their effect on its performance, our experiment suggests that the performance discrepancies between original and subsequent papers may not only be caused by the nonneutrality of the authors proposing the new method but also by differences regarding the level of expertise and field of application. Authors of new methods should thus focus not only on a transparent and extensive evaluation but also on comprehensive method documentation that enables the correct use of their methods in subsequent studies.

MCML Authors

Christina Sauer (née Nießl)

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Theresa Ullmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

* Former Member

Anne-Laure Boulesteix

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Biometry in Molecular Medicine

[784]

V. Steinborn.
Multilingual and multimodal bias probing and mitigation in natural language processing.
Dissertation 2024. DOI

Abstract

This thesis explores gender bias in Natural Language Processing (NLP) models, highlighting its negative societal impacts, such as discrimination in automated recruitment. While existing research largely focuses on English and occupational biases, this work expands the scope by addressing biases across different languages and contexts. The thesis presents three projects: (1) creating a multilingual dataset and a new bias evaluation measure, (2) examining how gender stereotypes in politeness affect cyberbullying detection in Korean and Japanese, and (3) analyzing how emoji-based visual representations influence biased text generation. These contributions aim to enhance fairness and inclusivity in NLP systems. (Shortened.)

MCML Authors

Victor Steinborn

* Former Member

[783]

J. Xie, Y. Shi, D. Ni, M. Milling, S. Liu, J. Zhang, K. Qian and B. W. Schuller.
Automatic Bird Sound Source Separation Based on Passive Acoustic Devices in Wild Environment.
IEEE Internet of Things Journal 11.9 (Jan. 2024). DOI

Abstract

The Internet of Things (IoT)-based passive acoustic monitoring (PAM) has shown great potential in large-scale remote bird monitoring. However, field recordings often contain overlapping signals, making precise bird information extraction challenging. To solve this challenge, first, the interchannel spatial feature is chosen as complementary information to the spectral feature to obtain additional spatial correlations between the sources. Then, an end-to-end model named BACPPNet is built based on Deeplabv3plus and enhanced with the polarized self-attention mechanism to estimate the spectral magnitude mask (SMM) for separating bird vocalizations. Finally, the separated bird vocalizations are recovered from SMMs and the spectrogram of mixed audio using the inverse short Fourier transform (ISTFT). We evaluate our proposed method utilizing the generated mixed data set. Experiments have shown that our method can separate bird vocalizations from mixed audio with root mean square error (RMSE), source-to-distortion ratio (SDR), source-to-interference ratio (SIR), source-to-artifact ratio (SAR), and short-time objective intelligibility (STOI) values of 2.82, 10.00 dB, 29.90 dB, 11.08 dB, and 0.66, respectively, which are better than existing methods. Furthermore, the average classification accuracy of the separated bird vocalizations drops the least. This indicates that our method outperforms other compared separation methods in bird sound separation and preserves the fidelity of the separated sound sources, which might help us better understand wild bird sound recordings.

MCML Authors

Manuel Milling

Health Informatics

Björn Schuller

Prof. Dr.

Health Informatics

[782]

T. Yang, J. Maly, S. Dirksen and G. Caire.
Plug-In Channel Estimation With Dithered Quantized Signals in Spatially Non-Stationary Massive MIMO Systems.
IEEE Transactions on Communications 72.1 (Jan. 2024). DOI

Abstract

As the array dimension of massive MIMO systems increases to unprecedented levels, two problems occur. First, the spatial stationarity assumption along the antenna elements is no longer valid. Second, the large array size results in an unacceptably high power consumption if high-resolution analog-to-digital converters are used. To address these two challenges, we consider a Bussgang linear minimum mean square error (BLMMSE)-based channel estimator for large scale massive MIMO systems with one-bit quantizers and a spatially non-stationary channel. Whereas other works usually assume that the channel covariance is known at the base station, we consider a plug-in BLMMSE estimator that uses an estimate of the channel covariance and rigorously analyze the distortion produced by using an estimated, rather than the true, covariance. To cope with the spatial non-stationarity, we introduce dithering into the quantized signals and provide a theoretical error analysis. In addition, we propose an angular domain fitting procedure which is based on solving an instance of non-negative least squares. For the multi-user data transmission phase, we further propose a BLMMSE-based receiver to handle one-bit quantized data signals. Our numerical results show that the performance of the proposed BLMMSE channel estimator is very close to the oracle-aided scheme with ideal knowledge of the channel covariance matrix. The BLMMSE receiver outperforms the conventional maximum-ratio-combining and zero-forcing receivers in terms of the resulting ergodic sum rate.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[781]

F. Zhang, Y. Shi, Z. Xiong and X. Zhu.
Few-Shot Object Detection in Remote Sensing: Lifting the Curse of Incompletely Annotated Novel Objects.
IEEE Transactions on Geoscience and Remote Sensing 62 (Jan. 2024). DOI GitHub

Abstract

Object detection (OD) is an essential and fundamental task in computer vision (CV) and satellite image processing. Existing deep learning methods have achieved impressive performance thanks to the availability of large-scale annotated datasets. Yet, in real-world applications, the availability of labels is limited. In this article, few-shot OD (FSOD) has emerged as a promising direction, which aims at enabling the model to detect novel objects with only few of them annotated. However, many existing FSOD algorithms overlook a critical issue: when an input image contains multiple novel objects and only a subset of them are annotated, the unlabeled objects will be considered as background during training. This can cause confusions and severely impact the model’s ability to recall novel objects. To address this issue, we propose a self-training-based FSOD (ST-FSOD) approach, which incorporates the self-training mechanism into the few-shot fine-tuning process. ST-FSOD aims to enable the discovery of novel objects that are not annotated and take them into account during training. On the one hand, we devise a two-branch region proposal networks (RPNs) to separate the proposal extraction of base and novel objects. On the another hand, we incorporate the student-teacher mechanism into RPN and the region-of-interest (RoI) head to include those highly confident yet unlabeled targets as pseudolabels. Experimental results demonstrate that our proposed method outperforms the state of the art in various FSOD settings by a large margin.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[780]

L. Kreitner, J. C. Paetzold, N. Rauch, C. Chen, A. M. H. Ahmed M. Hagag, A. E. Fayed, S. Sivaprasad, S. Rausch, J. Weichsel, B. H. Menze, M. Harders, B. Knier, D. Rückert and M. Menten.
Synthetic Optical Coherence Tomography Angiographs for Detailed Retinal Vessel Segmentation Without Human Annotations.
IEEE Transactions on Medical Imaging 43.6 (Jan. 2024). DOI

Abstract

Optical coherence tomography angiography (OCTA) is a non-invasive imaging modality that can acquire high-resolution volumes of the retinal vasculature and aid the diagnosis of ocular, neurological and cardiac diseases. Segmenting the visible blood vessels is a common first step when extracting quantitative biomarkers from these images. Classical segmentation algorithms based on thresholding are strongly affected by image artifacts and limited signal-to-noise ratio. The use of modern, deep learning-based segmentation methods has been inhibited by a lack of large datasets with detailed annotations of the blood vessels. To address this issue, recent work has employed transfer learning, where a segmentation network is trained on synthetic OCTA images and is then applied to real data. However, the previously proposed simulations fail to faithfully model the retinal vasculature and do not provide effective domain adaptation. Because of this, current methods are unable to fully segment the retinal vasculature, in particular the smallest capillaries. In this work, we present a lightweight simulation of the retinal vascular network based on space colonization for faster and more realistic OCTA synthesis. We then introduce three contrast adaptation pipelines to decrease the domain gap between real and artificial images. We demonstrate the superior segmentation performance of our approach in extensive quantitative and qualitative experiments on three public datasets that compare our method to traditional computer vision algorithms and supervised training using human annotations. Finally, we make our entire pipeline publicly available, including the source code, pretrained models, and a large dataset of synthetic OCTA images.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

[779]

P. Wesp, B. M. Schachtner, K. Jeblick, J. Topalis, M. Weber, F. Fischer, R. Penning, J. Ricke, M. Ingrisch and B. O. Sabel.
Radiological age assessment based on clavicle ossification in CT: enhanced accuracy through deep learning.
International Journal of Legal Medicine (Jan. 2024). DOI

Abstract

Background: Radiological age assessment using reference studies is inherently limited in accuracy due to a finite number of assignable skeletal maturation stages. To overcome this limitation, we present a deep learning approach for continuous age assessment based on clavicle ossification in computed tomography (CT).
Methods: Thoracic CT scans were retrospectively collected from the picture archiving and communication system. Individuals aged 15.0 to 30.0 years examined in routine clinical practice were included. All scans were automatically cropped around the medial clavicular epiphyseal cartilages. A deep learning model was trained to predict a person’s chronological age based on these scans. Performance was evaluated using mean absolute error (MAE). Model performance was compared to an optimistic human reader performance estimate for an established reference study method.
Results: The deep learning model was trained on 4,400 scans of 1,935 patients (training set: mean age =
24.2 years ± 4.0, 1132 female) and evaluated on 300 scans of 300 patients with a balanced age and sex distribution (test set: mean age = 22.5 years ± 4.4, 150 female). Model MAE was 1.65 years, and the highest absolute error was 6.40 years for females and 7.32 years for males. However, performance could be attributed to norm-variants or pathologic disorders. Human reader estimate MAE was 1.84 years and the highest absolute error was 3.40 years for females and 3.78 years for males.
Conclusions: We present a deep learning approach for continuous age predictions using CT volumes highlighting the medial clavicular epiphyseal cartilage with performance comparable to the human reader estimate.

MCML Authors

Philipp Wesp

Dr.

Clinical Data Science in Radiology

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Johanna Topalis

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[778]

L. Kook, P. F. M. Baumann, O. Dürr, B. Sick and D. Rügamer.
Estimating Conditional Distributions with Neural Networks Using R Package deeptrafo.
Journal of Statistical Software 111.10 (2024). DOI

Abstract

Contemporary empirical applications frequently require flexible regression models for complex response types and large tabular or non-tabular, including image or text, data. Classical regression models either break down under the computational load of processing such data or require additional manual feature extraction to make these problems tractable. Here, we present deeptrafo, a package for fitting flexible regression models for conditional distributions using a tensorflow backend with numerous additional processors, such as neural networks, penalties, and smoothing splines. Package deeptrafo implements deep conditional transformation models (DCTMs) for binary, ordinal, count, survival, continuous, and time series responses, potentially with uninformative censoring. Unlike other available methods, DCTMs do not assume a parametric family of distributions for the response. Further, the data analyst may trade off interpretability and flexibility by supplying custom neural network architectures and smoothers for each term in an intuitive formula interface. We demonstrate how to set up, fit, and work with DCTMs for several response types. We further showcase how to construct ensembles of these models, evaluate models using inbuilt cross-validation, and use other convenience functions for DCTMs in several applications. Lastly, we discuss DCTMs in light of other approaches to regression with non-tabular data.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[777]

F. Bongratz, A.-M. Rickmann and C. Wachinger.
Neural deformation fields for template-based reconstruction of cortical surfaces from MRI.
Medical Image Analysis 93 (Jan. 2024). DOI

Abstract

The reconstruction of cortical surfaces is a prerequisite for quantitative analyses of the cerebral cortex in magnetic resonance imaging (MRI). Existing segmentation-based methods separate the surface registration from the surface extraction, which is computationally inefficient and prone to distortions. We introduce Vox2Cortex-Flow (V2C-Flow), a deep mesh-deformation technique that learns a deformation field from a brain template to the cortical surfaces of an MRI scan. To this end, we present a geometric neural network that models the deformation-describing ordinary differential equation in a continuous manner. The network architecture comprises convolutional and graph-convolutional layers, which allows it to work with images and meshes at the same time. V2C-Flow is not only very fast, requiring less than two seconds to infer all four cortical surfaces, but also establishes vertex-wise correspondences to the template during reconstruction. In addition, V2C-Flow is the first approach for cortex reconstruction that models white matter and pial surfaces jointly, therefore avoiding intersections between them. Our comprehensive experiments on internal and external test data demonstrate that V2C-Flow results in cortical surfaces that are state-of-the-art in terms of accuracy. Moreover, we show that the established correspondences are more consistent than in FreeSurfer and that they can directly be utilized for cortex parcellation and group analyses of cortical thickness.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Medical Imaging

[776]

V. Lehmann, T. Zueger, M. Maritsch, M. Notter, S. Schallmoser, C. Bérubé, C. Albrecht, M. Kraus, S. Feuerriegel, E. Fleisch, T. Kowatsch, S. Lagger, M. Laimer, F. Wortmann and C. Stettler.
Machine Learning to Infer a Health State Using Biomedical Signals - Detection of Hypoglycemia in People with Diabetes while Driving Real Cars.
NEJM AI (Jan. 2024). DOI

Abstract

BACKGROUND: Hypoglycemia, one of the most dangerous acute complications of diabetes, poses a substantial risk for vehicle accidents. To date, both reliable detection and warning of hypoglycemia while driving remain unmet needs, as current sensing approaches are restricted by diagnostic delay, invasiveness, low availability, and high costs. This research aimed to develop and evaluate a machine learning (ML) approach for the detection of hypoglycemia during driving through data collected on driving characteristics and gaze/head motion.
METHODS: We collected driving and gaze/head motion data (47,998 observations) during controlled euglycemia and hypoglycemia from 30 individuals with type 1 diabetes (24 male participants; mean ±SD age, 40.1±10.3 years; mean glycated hemoglobin value, 6.9±0.7% [51.9±8.0 mmol/mol]) while participants drove a real car. ML models were built and evaluated to detect hypoglycemia solely on the basis of data regarding driving characteristics and gaze/head motion.
RESULTS: The ML approach detected hypoglycemia with high accuracy (area under the receiver-operating characteristic curve [AUROC], 0.80±0.11). When restricted to either driving characteristics or gaze/head motion data only, the detection performance remained high (AUROC, 0.73±0.07 and 0.70±0.16, respectively).
CONCLUSIONS: Hypoglycemia could be detected noninvasively during real car driving with an ML approach that used only data on driving characteristics and gaze/head motion, thus improving driving safety and self-management for people with diabetes. Interpretable ML also provided novel insights into behavioral changes in people driving while hypoglycemic. (Funded by the Swiss National Science Foundation and others; ClinicalTrials.gov numbers, NCT04569630 and NCT05308095.)

MCML Authors

Simon Schallmoser

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[775]

A. Gayoso, P. Weiler, M. Lotfollahi, D. Klein, J. Hong, A. Streets, F. J. Theis and N. Yosef.
Deep generative modeling of transcriptional dynamics for RNA velocity analysis in single cells.
Nature Methods 21 (Jan. 2024). DOI

Abstract

RNA velocity has been rapidly adopted to guide interpretation of transcriptional dynamics in snapshot single-cell data; however, current approaches for estimating RNA velocity lack effective strategies for quantifying uncertainty and determining the overall applicability to the system of interest. Here, we present veloVI (velocity variational inference), a deep generative modeling framework for estimating RNA velocity. veloVI learns a gene-specific dynamical model of RNA metabolism and provides a transcriptome-wide quantification of velocity uncertainty. We show that veloVI compares favorably to previous approaches with respect to goodness of fit, consistency across transcriptionally similar cells and stability across preprocessing pipelines for quantifying RNA abundance. Further, we demonstrate that veloVI’s posterior velocity uncertainty can be used to assess whether velocity analysis is appropriate for a given dataset. Finally, we highlight veloVI as a flexible framework for modeling transcriptional dynamics by adapting the underlying dynamical model to use time-dependent transcription rates.

MCML Authors

Philipp Weiler

* Former Member

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[774]

D. Zhu, Q. Khan and D. Cremers.
Multi-vehicle trajectory prediction and control at intersections using state and intention information.
Neurocomputing 574 (Jan. 2024). DOI GitHub

Abstract

Traditional deep learning approaches for prediction of future trajectory of multiple road agents rely on knowing information about their past trajectory. In contrast, this work utilizes information of only the current state and intended direction to predict the future trajectory of multiple vehicles at intersections. Incorporating intention information has two distinct advantages: (1) It allows to not just predict the future trajectory but also control the multiple vehicles. (2) By manipulating the intention, the interaction among the vehicles is adapted accordingly to achieve desired behavior. Both these advantages would otherwise not be possible using only past trajectory information Our model utilizes message passing of information between the vehicle nodes for a more holistic overview of the environment, resulting in better trajectory prediction and control of the vehicles. This work also provides a thorough investigation and discussion into the disparity between offline and online metrics for the task of multi-agent control. We particularly show why conducting only offline evaluation would not suffice, thereby necessitating online evaluation. We demonstrate the superiority of utilizing intention information rather than past trajectory in online scenarios. Lastly, we show the capability of our method in adapting to different domains through experiments conducted on two distinct simulation platforms i.e. SUMO and CARLA.

MCML Authors

Dekai Zhu

Computer Aided Medical Procedures & Augmented Reality

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Computer Vision & Artificial Intelligence

[773]

M. M. Mandl, S. Hoffmann, S. Bieringer, A. E. Jacob, M. Kraft, S. Lemster and A.-L. Boulesteix.
Raising awareness of uncertain choices in empirical data analysis: A teaching concept toward replicable research practices.
PLOS Computational Biology 20.3 (2024). DOI

Abstract

Throughout their education and when reading the scientific literature, students may get the impression that there is a unique and correct analysis strategy for every data analysis task and that this analysis strategy will always yield a significant and noteworthy result. This expectation conflicts with a growing realization that there is a multiplicity of possible analysis strategies in empirical research, which will lead to overoptimism and nonreplicable research findings if it is combined with result-dependent selective reporting. Here, we argue that students are often ill-equipped for real-world data analysis tasks and unprepared for the dangers of selectively reporting the most promising results. We present a seminar course intended for advanced undergraduates and beginning graduate students of data analysis fields such as statistics, data science, or bioinformatics that aims to increase the awareness of uncertain choices in the analysis of empirical data and present ways to deal with these choices through theoretical modules and practical hands-on sessions.

MCML Authors

Maximilian Mandl

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[772]

B. S. Siepe, F. Bartoš, T. P. Morris, A.-L. Boulesteix, D. W. Heck and S. Pawel.
Simulation Studies for Methodological Research in Psychology: A Standardized Template for Planning, Preregistration, and Reporting.
Psychological Methods Advance online publication (2024). DOI

Abstract

Simulation studies are widely used for evaluating the performance of statistical methods in psychology. However, the quality of simulation studies can vary widely in terms of their design, execution, and reporting. In order to assess the quality of typical simulation studies in psychology, we reviewed 321 articles published in Psychological Methods, Behavior Research Methods, and Multivariate Behavioral Research in 2021 and 2022, among which 100/321 = 31.2% report a simulation study. We find that many articles do not provide complete and transparent information about key aspects of the study, such as justifications for the number of simulation repetitions, Monte Carlo uncertainty estimates, or code and data to reproduce the simulation studies. To address this problem, we provide a summary of the ADEMP (aims, data-generating mechanism, estimands and other targets, methods, performance measures) design and reporting framework from Morris et al. (2019) adapted to simulation studies in psychology. Based on this framework, we provide ADEMP-PreReg, a step-by-step template for researchers to use when designing, potentially preregistering, and reporting their simulation studies. We give formulae for estimating common performance measures, their Monte Carlo standard errors, and for calculating the number of simulation repetitions to achieve a desired Monte Carlo standard error. Finally, we give a detailed tutorial on how to apply the ADEMP framework in practice using an example simulation study on the evaluation of methods for the analysis of pre–post measurement experiments. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[771]

Z. S. Dunias, B. Van Calster, D. Timmerman, A.-L. Boulesteix and M. van Smeden.
A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study.
Statistics in Medicine (Jan. 2024). DOI

Abstract

Tuning hyperparameters, such as the regularization parameter in Ridge or Lasso regression, is often aimed at improving the predictive performance of risk prediction models. In this study, various hyperparameter tuning procedures for clinical prediction models were systematically compared and evaluated in low-dimensional data. The focus was on out-of-sample predictive performance (discrimination, calibration, and overall prediction error) of risk prediction models developed using Ridge, Lasso, Elastic Net, or Random Forest. The influence of sample size, number of predictors and events fraction on performance of the hyperparameter tuning procedures was studied using extensive simulations. The results indicate important differences between tuning procedures in calibration performance, while generally showing similar discriminative performance. The one-standard-error rule for tuning applied to cross-validation (1SE CV) often resulted in severe miscalibration. Standard non-repeated and repeated cross-validation (both 5-fold and 10-fold) performed similarly well and outperformed the other tuning procedures. Bootstrap showed a slight tendency to more severe miscalibration than standard cross-validation-based tuning procedures. Differences between tuning procedures were larger for smaller sample sizes, lower events fractions and fewer predictors. These results imply that the choice of tuning procedure can have a profound influence on the predictive performance of prediction models. The results support the application of standard 5-fold or 10-fold cross-validation that minimizes out-of-sample prediction error. Despite an increased computational burden, we found no clear benefit of repeated over non-repeated cross-validation for hyperparameter tuning. We warn against the potentially detrimental effects on model calibration of the popular 1SE CV rule for tuning prediction models in low-dimensional settings.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[770]

R. Hornung, F. Ludwigs, J. Hagenberg and A.-L. Boulesteix.
Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study.
Wiley Interdisciplinary Reviews: Computational Statistics 16.1 (Jan. 2024). DOI

Abstract

As the availability of omics data has increased in the last few years, more multi-omics data have been generated, that is, high-dimensional molecular data consisting of several types such as genomic, transcriptomic, or proteomic data, all obtained from the same patients. Such data lend themselves to being used as covariates in automatic outcome prediction because each omics type may contribute unique information, possibly improving predictions compared to using only one omics data type. Frequently, however, in the training data and the data to which automatic prediction rules should be applied, the test data, the different omics data types are not available for all patients. We refer to this type of data as block-wise missing multi-omics data. First, we provide a literature review on existing prediction methods applicable to such data. Subsequently, using a collection of 13 publicly available multi-omics data sets, we compare the predictive performances of several of these approaches for different block-wise missingness patterns. Finally, we discuss the results of this empirical comparison study and draw some tentative conclusions.

MCML Authors

Roman Hornung

Dr.

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[769]

M. Wünsch, C. Sauer, P. Callahan, L. C. Hinske and A.-L. Boulesteix.
From RNA sequencing measurements to the final results: a practical guide to navigating the choices and uncertainties of gene set analysis.
Wiley Interdisciplinary Reviews: Computational Statistics 16.1 (Jan. 2024). DOI

Abstract

Gene set analysis (GSA), a popular approach for analyzing high-throughput gene expression data, aims to identify sets of related genes that show significantly enriched or depleted expression patterns between different conditions. In the last years, a multitude of methods have been developed for this task. However, clear guidance is lacking: choosing the right method is the first hurdle a researcher is confronted with. No less challenging than overcoming this so-called method uncertainty is the procedure of preprocessing, from knowing which steps are required to selecting a corresponding approach from the plethora of valid options to create the accepted input object (data preprocessing uncertainty), with clear guidance again being scarce. Here, we provide a practical guide through all steps required to conduct GSA, beginning with a concise overview of a selection of established methods, including Gene Set Enrichment Analysis and Database for Annotation, Visualization, and Integrated Discovery (DAVID). We thereby lay a special focus on reviewing and explaining the necessary preprocessing steps for each method under consideration (e.g., the necessity of a transformation of the RNA sequencing data)—an essential aspect that is typically paid only limited attention to in both existing reviews and applications. To raise awareness of the spectrum of uncertainties, our review is accompanied by an extensive overview of the literature on valid approaches for each step and illustrative R code demonstrating the complex analysis pipelines. It ends with a discussion and recommendations to both users and developers to ensure that the results of GSA are, despite the above-mentioned uncertainties, replicable and transparent.

MCML Authors

Milena Wünsch

Biometry in Molecular Medicine

Christina Sauer (née Nießl)

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[768]

F. Bongratz, J. Fecht, A.-M. Rickmann and C. Wachinger.
V2C-Long: Longitudinal Cortex Reconstruction with Spatiotemporal Correspondence.
Preprint (2024). arXiv

Abstract

Reconstructing the cortex from longitudinal MRI is indispensable for analyzing morphological changes in the human brain. Despite the recent disruption of cortical surface reconstruction with deep learning, challenges arising from longitudinal data are still persistent. Especially the lack of strong spatiotemporal point correspondence hinders downstream analyses due to the introduced noise. To address this issue, we present V2C-Long, the first dedicated deep learning-based cortex reconstruction method for longitudinal MRI. In contrast to existing methods, V2C-Long surfaces are directly comparable in a cross-sectional and longitudinal manner. We establish strong inherent spatiotemporal correspondences via a novel composition of two deep mesh deformation networks and fast aggregation of feature-enhanced within-subject templates. The results on internal and external test data demonstrate that V2C-Long yields cortical surfaces with improved accuracy and consistency compared to previous methods. Finally, this improvement manifests in higher sensitivity to regional cortical atrophy in Alzheimer’s disease.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[767]

D. Di Fraia, A. Marino, J. H. Lee, E. Kelmer Sacramento, M. Baumgart, S. Bagnoli, P. T. da Silva, A. K. Sahu, G. Siano, M. Tiessen, E. Terzibasi-Tozzini, J. Gagneur, J. Frydman, A. Cellerino and A. Ori.
Impaired biogenesis of basic proteins impacts multiple hallmarks of the aging brain.
Preprint (Jan. 2024). DOI

Abstract

Aging and neurodegeneration entail diverse cellular and molecular hallmarks. Here, we studied the effects of aging on the transcriptome, translatome, and multiple layers of the proteome in the brain of a short-lived killifish. We reveal that aging causes widespread reduction of proteins enriched in basic amino acids that is independent of mRNA regulation, and it is not due to impaired proteasome activity. Instead, we identify a cascade of events where aberrant translation pausing leads to reduced ribosome availability resulting in proteome remodeling independently of transcriptional regulation. Our research uncovers a vulnerable point in the aging brain’s biology – the biogenesis of basic DNA/RNA binding proteins. This vulnerability may represent a unifying principle that connects various aging hallmarks, encompassing genome integrity and the biosynthesis of macromolecules.

MCML Authors

Pedro Tomaz da Silva

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

[766]

S. Dirksen and J. Maly.
Tuning-free one-bit covariance estimation using data-driven dithering.
Preprint (Jan. 2024). arXiv

Abstract

We consider covariance estimation of any subgaussian distribution from finitely many i.i.d. samples that are quantized to one bit of information per entry. Recent work has shown that a reliable estimator can be constructed if uniformly distributed dithers on [−λ,λ] are used in the one-bit quantizer. This estimator enjoys near-minimax optimal, non-asymptotic error estimates in the operator and Frobenius norms if λ is chosen proportional to the largest variance of the distribution. However, this quantity is not known a-priori, and in practice λ needs to be carefully tuned to achieve good performance. In this work we resolve this problem by introducing a tuning-free variant of this estimator, which replaces λ by a data-driven quantity. We prove that this estimator satisfies the same non-asymptotic error estimates - up to small (logarithmic) losses and a slightly worse probability estimate. We also show that by using refined data-driven dithers that vary per entry of each sample, one can construct an estimator satisfying the same estimation error bound as the sample covariance of the samples before quantization – again up logarithmic losses. Our proofs rely on a new version of the Burkholder-Rosenthal inequalities for matrix martingales, which is expected to be of independent interest.

MCML Authors

Johannes Maly

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Mathematical Data Science and Artificial Intelligence

[765]

P. Wicke.
Probing Language Models' Gesture Understanding for Enhanced Human-AI Interaction.
Preprint (Jan. 2024). arXiv

Abstract

The rise of Large Language Models (LLMs) has affected various disciplines that got beyond mere text generation. Going beyond their textual nature, this project proposal aims to investigate the interaction between LLMs and non-verbal communication, specifically focusing on gestures. The proposal sets out a plan to examine the proficiency of LLMs in deciphering both explicit and implicit non-verbal cues within textual prompts and their ability to associate these gestures with various contextual factors. The research proposes to test established psycholinguistic study designs to construct a comprehensive dataset that pairs textual prompts with detailed gesture descriptions, encompassing diverse regional variations, and semantic labels. To assess LLMs’ comprehension of gestures, experiments are planned, evaluating their ability to simulate human behaviour in order to replicate psycholinguistic experiments. These experiments consider cultural dimensions and measure the agreement between LLM-identified gestures and the dataset, shedding light on the models’ contextual interpretation of non-verbal cues (e.g. gestures).

MCML Authors

Philipp Wicke

Dr.

Computational Linguistics

2023

[764]

L. Zumeta-Olaskoaga, M. Weigert, J. Larruskain, E. Bikandi, I. Setuain, J. Lekue, H. Küchenhoff and D.-J. Lee.
Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models.
Advances in Statistical Analysis (Nov. 2023). DOI

Abstract

Data-based methods and statistical models are given special attention to the study of sports injuries to gain in-depth understanding of its risk factors and mechanisms. The objective of this work is to evaluate the use of shared frailty Cox models for the prediction of occurring sports injuries, and to compare their performance with different sets of variables selected by several regularized variable selection approaches. The study is motivated by specific characteristics commonly found for sports injury data, that usually include reduced sample size and even fewer number of injuries, coupled with a large number of potentially influential variables. Hence, we conduct a simulation study to address these statistical challenges and to explore regularized Cox model strategies together with shared frailty models in different controlled situations. We show that predictive performance greatly improves as more player observations are available. Methods that result in sparse models and favour interpretability, e.g. Best Subset Selection and Boosting, are preferred when the sample size is small. We include a real case study of injuries of female football players of a Spanish football club.

MCML Authors

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Consulting Unit (StaBLab)

[763]

J. Goschenhofer.
Reducing the effort for data annotation: contributions to weakly supervised deep learning.
Dissertation 2023. DOI

Abstract

This thesis addresses methods for training machine learning models with limited labeled data, focusing on semi-supervised, positive unlabeled, constrained clustering, and transfer learning. It explores deep semi-supervised learning, particularly in time series and medical imaging contexts, and investigates positive unlabeled learning methods that utilize predictive uncertainty for self-training. The thesis also introduces weakly supervised learning for constrained clustering, combining it with semi-supervised approaches, and applies transfer learning to tasks with varying granularity in medical domains. (Shortened).

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

[762]

H. A. Gündüz, S. Giri, M. Binder, B. Bischl and M. Rezaei.
Uncertainty Quantification for Deep Learning Models Predicting the Regulatory Activity of DNA Sequences.
ICMLA 2023 - 22nd IEEE International Conference on Machine Learning and Applications. Jacksonville, FL, USA, Dec 15-17, 2023. DOI

Abstract

The field of computational biology has been enhanced by deep learning models, which hold great promise for revolutionizing domains such as protein folding and drug discovery. Recent studies have underscored the tremendous potential of these models, particularly in the realm of gene regulation and the more profound understanding of the non-coding regions of the genome. On the other hand, this raises significant concerns about the reliability and efficacy of such models, which have their own biases by design, along with those learned from the data. Uncertainty quantification allows us to measure where the system is confident and know when it can be trusted. In this paper, we study several uncertainty quantification methods with respect to a multi-target regression task, specifically predicting regulatory activity profiles using DNA sequence data. Using the Basenji model, we investigate how such methods can improve in-domain generalization, out-of-distribution detection, and provide coverage guarantees on prediction intervals.

MCML Authors

Hüseyin Anil Gündüz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[761]

M. von Zahn, O. Hinz and S. Feuerriegel.
Locating disparities in machine learning.
IEEE BigData 2023 - IEEE International Conference on Big Data. Sorrento, Italy, Dec 15-18, 2023. DOI

Abstract

Machine learning can provide predictions with disparate outcomes, in which subgroups of the population (e.g., defined by age, gender, or other sensitive attributes) are systematically disadvantaged. In order to comply with upcoming legislation, practitioners need to locate such disparate outcomes. However, previous literature typically detects disparities through statistical procedures for when the sensitive attribute is specified a priori. This limits applicability in real-world settings where datasets are high dimensional and, on top of that, sensitive attributes may be unknown. As a remedy, we propose a data-driven framework called Automatic Location of Disparities (ALD) which aims at locating disparities in machine learning. ALD meets several demands from industry: ALD (1) is applicable to arbitrary machine learning classifiers; (2) operates on different definitions of disparities (e.g., statistical parity or equalized odds); (3) deals with both categorical and continuous predictors even if disparities arise from complex and multi-way interactions known as intersectionality (e.g., age above 60 and female). ALD produces interpretable audit reports as output. We demonstrate the effectiveness of ALD based on both synthetic and real-world datasets. As a result, we empower practitioners to effectively locate and mitigate disparities in machine learning algorithms, conduct algorithmic audits, and protect individuals from discrimination.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[760]

S. Chen, J. Gu, Z. Han, Y. Ma, P. Torr and V. Tresp.
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL GitHub

Abstract

Various adaptation methods, such as LoRA, prompts, and adapters, have been proposed to enhance the performance of pre-trained vision-language models in specific domains. As test samples in real-world applications usually differ from adaptation data, the robustness of these adaptation methods against distribution shifts are essential. In this study, we assess the robustness of 11 widely-used adaptation methods across 4 vision-language datasets under multimodal corruptions. Concretely, we introduce 7 benchmark datasets, including 96 visual and 87 textual corruptions, to investigate the robustness of different adaptation methods, the impact of available adaptation examples, and the influence of trainable parameter size during adaptation. Our analysis reveals that: 1) Adaptation methods are more sensitive to text corruptions than visual corruptions. 2) Full fine-tuning does not consistently provide the highest robustness; instead, adapters can achieve better robustness with comparable clean performance. 3) Contrary to expectations, our findings indicate that increasing the number of adaptation data and parameters does not guarantee enhanced robustness; instead, it results in even lower robustness. We hope this study could benefit future research in the development of robust multimodal adaptation methods.

MCML Authors

Shuo Chen

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Database Systems and Data Mining

[759]

D. Frauen, V. Melnychuk and S. Feuerriegel.
Sharp Bounds for Generalized Causal Sensitivity Analysis.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Causal inference from observational data is crucial for many disciplines such as medicine and economics. However, sharp bounds for causal effects under relaxations of the unconfoundedness assumption (causal sensitivity analysis) are subject to ongoing research. So far, works with sharp bounds are restricted to fairly simple settings (e.g., a single binary treatment). In this paper, we propose a unified framework for causal sensitivity analysis under unobserved confounding in various settings. For this, we propose a flexible generalization of the marginal sensitivity model (MSM) and then derive sharp bounds for a large class of causal effects. This includes (conditional) average treatment effects, effects for mediation analysis and path analysis, and distributional effects. Furthermore, our sensitivity model is applicable to discrete, continuous, and time-varying treatments. It allows us to interpret the partial identification problem under unobserved confounding as a distribution shift in the latent confounders while evaluating the causal effect of interest. In the special case of a single binary treatment, our bounds for (conditional) average treatment effects coincide with recent optimality results for causal sensitivity analysis. Finally, we propose a scalable algorithm to estimate our sharp bounds from observational data.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[758]

F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
SHAP-IQ: Unified Approximation of any-order Shapley Interactions.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i.e. interaction indices that satisfy the linearity, symmetry and dummy axiom. SHAP-IQ is based on a novel representation and, in contrast to existing methods, we provide theoretical guarantees for its approximation quality, as well as estimates for the variance of the point estimates. For the special case of SV, our approach reveals a novel representation of the SV and corresponds to Unbiased KernelSHAP with a greatly simplified calculation. We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[757]

M. Ghahremani and C. Wachinger.
RegBN: Batch Normalization of Multimodal Data with Regularization.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL GitHub

Abstract

Recent years have witnessed a surge of interest in integrating high-dimensional data captured by multisource sensors, driven by the impressive success of neural networks in integrating multimodal data. However, the integration of heterogeneous multimodal data poses a significant challenge, as confounding effects and dependencies among such heterogeneous data sources introduce unwanted variability and bias, leading to suboptimal performance of multimodal models. Therefore, it becomes crucial to normalize the low- or high-level features extracted from data modalities before their fusion takes place. This paper introduces RegBN, a novel approach for multimodal Batch Normalization with REGularization. RegBN uses the Frobenius norm as a regularizer term to address the side effects of confounders and underlying dependencies among different data sources. The proposed method generalizes well across multiple modalities and eliminates the need for learnable parameters, simplifying training and inference. We validate the effectiveness of RegBN on eight databases from five research areas, encompassing diverse modalities such as language, audio, image, video, depth, tabular, and 3D MRI. The proposed method demonstrates broad applicability across different architectures such as multilayer perceptrons, convolutional neural networks, and vision transformers, enabling effective normalization of both low- and high-level features in multimodal neural networks.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[756]

G. Kaissis, A. Ziller, S. Kolek, A. Riess and D. Rückert.
Optimal privacy guarantees for a relaxed threat model: Addressing sub-optimal adversaries in differentially private machine learning.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Differentially private mechanisms restrict the membership inference capabilities of powerful (optimal) adversaries against machine learning models. Such adversaries are rarely encountered in practice. In this work, we examine a more realistic threat model relaxation, where (sub-optimal) adversaries lack access to the exact model training database, but may possess related or partial data. We then formally characterise and experimentally validate adversarial membership inference capabilities in this setting in terms of hypothesis testing errors. Our work helps users to interpret the privacy properties of sensitive data processing systems under realistic threat model relaxations and choose appropriate noise levels for their use-case.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[755]

C. Kümmerle and J. Maly.
Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

We propose a new algorithm for the problem of recovering data that adheres to multiple, heterogenous low-dimensional structures from linear observations. Focussing on data matrices that are simultaneously row-sparse and low-rank, we propose and analyze an iteratively reweighted least squares (IRLS) algorithm that is able to leverage both structures. In particular, it optimizes a combination of non-convex surrogates for row-sparsity and rank, a balancing of which is built into the algorithm. We prove locally quadratic convergence of the iterates to a simultaneously structured data matrix in a regime of minimal sample complexity (up to constants and a logarithmic factor), which is known to be impossible for a combination of convex surrogates. In experiments, we show that the IRLS method exhibits favorable empirical convergence, identifying simultaneously row-sparse and low-rank matrices from fewer measurements than state-of-the-art methods.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Data Science and Artificial Intelligence

[754]

S. Maskey, R. Paolino, A. Bacho and G. Kutyniok.
A Fractional Graph Laplacian Approach to Oversmoothing.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL GitHub

Abstract

Graph neural networks (GNNs) have shown state-of-the-art performances in various applications. However, GNNs often struggle to capture long-range dependencies in graphs due to oversmoothing. In this paper, we generalize the concept of oversmoothing from undirected to directed graphs. To this aim, we extend the notion of Dirichlet energy by considering a directed symmetrically normalized Laplacian. As vanilla graph convolutional networks are prone to oversmooth, we adopt a neural graph ODE framework. Specifically, we propose fractional graph Laplacian neural ODEs, which describe non-local dynamics. We prove that our approach allows propagating information between distant nodes while maintaining a low probability of long-distance jumps. Moreover, we show that our method is more flexible with respect to the convergence of the graph’s Dirichlet energy, thereby mitigating oversmoothing. We conduct extensive experiments on synthetic and real-world graphs, both directed and undirected, demonstrating our method’s versatility across diverse graph homophily levels.

MCML Authors

Sohir Maskey

Mathematical Foundations of Artificial Intelligence

Raffaele Paolino

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Mathematical Foundations of Artificial Intelligence

[753]

V. Melnychuk, D. Frauen and S. Feuerriegel.
Partial Counterfactual Identification of Continuous Outcomes with a Curvature Sensitivity Model.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Counterfactual inference aims to answer retrospective ‘what if’ questions and thus belongs to the most fine-grained type of inference in Pearl’s causality ladder. Existing methods for counterfactual inference with continuous outcomes aim at point identification and thus make strong and unnatural assumptions about the underlying structural causal model. In this paper, we relax these assumptions and aim at partial counterfactual identification of continuous outcomes, i.e., when the counterfactual query resides in an ignorance interval with informative bounds. We prove that, in general, the ignorance interval of the counterfactual queries has non-informative bounds, already when functions of structural causal models are continuously differentiable. As a remedy, we propose a novel sensitivity model called Curvature Sensitivity Model. This allows us to obtain informative bounds by bounding the curvature of level sets of the functions. We further show that existing point counterfactual identification methods are special cases of our Curvature Sensitivity Model when the bound of the curvature is set to zero. We then propose an implementation of our Curvature Sensitivity Model in the form of a novel deep generative model, which we call Augmented Pseudo-Invertible Decoder. Our implementation employs (i) residual normalizing flows with (ii) variational augmentations. We empirically demonstrate the effectiveness of our Augmented Pseudo-Invertible Decoder. To the best of our knowledge, ours is the first partial identification model for Markovian structural causal models with continuous outcomes.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Artificial Intelligence in Management

[752]

S. Šćepanović, I. Obadic, S. Joglekar, L. Giustarini, C. Nattero, D. Quercia and X. Zhu.
MedSat: A Public Health Dataset for England Featuring Medical Prescriptions and Satellite Imagery.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

As extreme weather events become more frequent, understanding their impact on human health becomes increasingly crucial. However, the utilization of Earth Observation to effectively analyze the environmental context in relation to health remains limited. This limitation is primarily due to the lack of fine-grained spatial and temporal data in public and population health studies, hindering a comprehensive understanding of health outcomes. Additionally, obtaining appropriate environmental indices across different geographical levels and timeframes poses a challenge. For the years 2019 (pre-COVID) and 2020 (COVID), we collected spatio-temporal indicators for all Lower Layer Super Output Areas in England. These indicators included: i) 111 sociodemographic features linked to health in existing literature, ii) 43 environmental point features (e.g., greenery and air pollution levels), iii) 4 seasonal composite satellite images each with 11 bands, and iv) prescription prevalence associated with five medical conditions (depression, anxiety, diabetes, hypertension, and asthma), opioids and total prescriptions. We combined these indicators into a single MEDSAT dataset, the availability of which presents an opportunity for the machine learning community to develop new techniques specific to public health. These techniques would address challenges such as handling large and complex data volumes, performing effective feature engineering on environmental and sociodemographic factors, capturing spatial and temporal dependencies in the models, addressing imbalanced data distributions, developing novel computer vision methods for health modeling based on satellite imagery, ensuring model explainability, and achieving generalization beyond the specific geographical region.

MCML Authors

Ivica Obadic

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[751]

Y. Scholten, J. Schuchardt, A. Bojchevski and S. Günnemann.
Hierarchical randomized smoothing.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Real-world data is complex and often consists of objects that can be decomposed into multiple entities (e.g. images into pixels, graphs into interconnected nodes). Randomized smoothing is a powerful framework for making models provably robust against small changes to their inputs - by guaranteeing robustness of the majority vote when randomly adding noise before classification. Yet, certifying robustness on such complex data via randomized smoothing is challenging when adversaries do not arbitrarily perturb entire objects (e.g. images) but only a subset of their entities (e.g. pixels). As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities. By adding noise in a more targeted manner than existing methods we obtain stronger robustness guarantees while maintaining high accuracy. We initialize hierarchical smoothing using different noising distributions, yielding novel robustness certificates for discrete and continuous domains. We experimentally demonstrate the importance of hierarchical smoothing in image and node classification, where it yields superior robustness-accuracy trade-offs. Overall, hierarchical smoothing is an important contribution towards models that are both - certifiably robust to perturbations and accurate.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[750]

J. Schuchardt, Y. Scholten and S. Günnemann.
Provable Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

A machine learning model is traditionally considered robust if its prediction remains (almost) constant under input perturbations with small norm. However, real-world tasks like molecular property prediction or point cloud segmentation have inherent equivariances, such as rotation or permutation equivariance. In such tasks, even perturbations with large norm do not necessarily change an input’s semantic content. Furthermore, there are perturbations for which a model’s prediction explicitly needs to change. For the first time, we propose a sound notion of adversarial robustness that accounts for task equivariance. We then demonstrate that provable robustness can be achieved by (1) choosing a model that matches the task’s equivariances (2) certifying traditional adversarial robustness. Certification methods are, however, unavailable for many models, such as those with continuous equivariances. We close this gap by developing the framework of equivariance-preserving randomized smoothing, which enables architecture-agnostic certification. We additionally derive the first architecture-specific graph edit distance certificates, i.e. sound robustness guarantees for isomorphism equivariant tasks like node classification. Overall, a sound notion of robustness is an important prerequisite for future work at the intersection of robust and geometric machine learning.

MCML Authors

Stephan Günnemann

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Data Analytics & Machine Learning

[749]

J. Schweisthal, D. Frauen, V. Melnychuk and S. Feuerriegel.
Reliable Off-Policy Learning for Dosage Combinations.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Decision-making in personalized medicine such as cancer therapy or critical care must often make choices for dosage combinations, i.e., multiple continuous treatments. Existing work for this task has modeled the effect of multiple treatments independently, while estimating the joint effect has received little attention but comes with non-trivial challenges. In this paper, we propose a novel method for reliable off-policy learning for dosage combinations. Our method proceeds along three steps: (1) We develop a tailored neural network that estimates the individualized dose-response function while accounting for the joint effect of multiple dependent dosages. (2) We estimate the generalized propensity score using conditional normalizing flows in order to detect regions with limited overlap in the shared covariate-treatment space. (3) We present a gradient-based learning algorithm to find the optimal, individualized dosage combinations. Here, we ensure reliable estimation of the policy value by avoiding regions with limited overlap. We finally perform an extensive evaluation of our method to show its effectiveness. To the best of our knowledge, ours is the first work to provide a method for reliable off-policy learning for optimal dosage combinations.

MCML Authors

Jonas Schweisthal

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[748]

M. Seleznova, D. Weitzner, R. Giryes, G. Kutyniok and H.-H. Chou.
Neural (Tangent Kernel) Collapse.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

This work bridges two important concepts: the Neural Tangent Kernel (NTK), which captures the evolution of deep neural networks (DNNs) during training, and the Neural Collapse (NC) phenomenon, which refers to the emergence of symmetry and structure in the last-layer features of well-trained classification DNNs. We adopt the natural assumption that the empirical NTK develops a block structure aligned with the class labels, i.e., samples within the same class have stronger correlations than samples from different classes. Under this assumption, we derive the dynamics of DNNs trained with mean squared (MSE) loss and break them into interpretable phases. Moreover, we identify an invariant that captures the essence of the dynamics, and use it to prove the emergence of NC in DNNs with block-structured NTK. We provide large-scale numerical experiments on three common DNN architectures and three benchmark datasets to support our theory.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Foundations of Artificial Intelligence

[747]

N. Sturma, C. Squires, M. Drton and C. Uhler.
Unpaired Multi-Domain Causal Representation Learning.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

The goal of causal representation learning is to find a representation of data that consists of causally related latent variables. We consider a setup where one has access to data from multiple domains that potentially share a causal representation. Crucially, observations in different domains are assumed to be unpaired, that is, we only observe the marginal distribution in each domain but not their joint distribution. In this paper, we give sufficient conditions for identifiability of the joint distribution and the shared causal graph in a linear setup. Identifiability holds if we can uniquely recover the joint distribution and the shared causal representation from the marginal distributions in each domain. We transform our results into a practical method to recover the shared latent causal graph.

MCML Authors

Nils Sturma

Mathematical Statistics

Mathias Drton

Prof. Dr.

Mathematical Statistics

[746]

G. Zhai, E. P. Örnek, S.-C. Wu, Y. Di, F. Tombari, N. Navab and B. Busam.
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graphs.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Controllable scene synthesis aims to create interactive environments for numerous industrial use cases. Scene graphs provide a highly suitable interface to facilitate these applications by abstracting the scene context in a compact manner. Existing methods, reliant on retrieval from extensive databases or pre-trained shape embeddings, often overlook scene-object and object-object relationships, leading to inconsistent results due to their limited generation capacity. To address this issue, we present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes, which are semantically realistic and conform to commonsense. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes via latent diffusion, capturing global scene-object and local inter-object relationships in the scene graph while preserving shape diversity. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model. Due to the lack of a scene graph dataset offering high-quality object-level meshes with relations, we also construct SG-FRONT, enriching the off-the-shelf indoor dataset 3D-FRONT with additional scene graph labels. Extensive experiments are conducted on SG-FRONT, where CommonScenes shows clear advantages over other methods regarding generation consistency, quality, and diversity. Codes and the dataset are available on the website.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Aided Medical Procedures & Augmented Reality

[745]

S. Zhang, P. Wicke, L. K. Senel, L. Figueredo, A. Naceri, S. Haddadin, B. Plank and H. Schütze.
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation.
NeurIPS 2023 - 6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models at the 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

The convergence of embodied agents and large language models (LLMs) has brought significant advancements to embodied instruction following.Particularly, the strong reasoning capabilities of LLMs make it possible for robots to perform long-horizon tasks without expensive annotated demonstrations.However, public benchmarks for testing the long-horizon reasoning capabilities of language-conditioned robots in various scenarios are still missing. To fill this gap, this work focuses on the tabletopmanipulation task and releases a simulation benchmark,textit{LoHoRavens}, which covers various long-horizonreasoning aspects spanning color, size, space, arithmeticsand reference.Furthermore, there is a key modality bridging problem forlong-horizon manipulation tasks with LLMs: how toincorporate the observation feedback during robot executionfor the LLM’s closed-loop planning, which is however less studied by prior work. We investigate two methods of bridging the modality gap: caption generation and learnable interface for incorporating explicit and implicit observation feedback to the LLM, respectively.These methods serve as the two baselines for our proposed benchmark. Experiments show that both methods struggle to solve most tasks, indicating long-horizon manipulation tasks are still challenging for current popular models.We expect the proposed public benchmark and baselines can help the community develop better models for long-horizon tabletop manipulation tasks.

MCML Authors

Shengqiang Zhang

Computational Linguistics

Philipp Wicke

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Lütfi Kerem Senel

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[744]

X. Li, E. Nie and S. Liang.
From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL.
NeurIPS 2023 - Workshop Instruction Tuning and Instruction Following at the 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

The remarkable ability of Large Language Models (LLMs) to understand and follow instructions has sometimes been limited by their in-context learning (ICL) performance in low-resource languages. To address this, we introduce a novel approach that leverages cross-lingual retrieval-augmented in-context learning (CREA-ICL). By extracting semantically similar prompts from high-resource languages, we aim to bolster the zero-shot performance of multilingual pretrained language models (MPLMs) across diverse tasks. Though our approach yields steady improvements in classification tasks, it faces challenges in generation tasks, with Bangla serving as a key case study. Our evaluation offers insights into the performance dynamics of retrieval-augmented in-context learning across both classification and generation domains.

MCML Authors

Ercong Nie

Computational Linguistics

Sheng Liang

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

[743]

R. Liao, X. Jia, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
TGL @NeurIPS 2023 - Workshop Temporal Graph Learning at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

The rapid advancements in large language models (LLMs) have ignited interest in the realm of the temporal knowledge graph (TKG) domain, where conventional carefully designed embedding-based and rule-based models dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex graph data structure and sequential natural expressions LLMs can handle, and between the enormous data volume of TKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and lightweight few-shot parameter-efficient instruction tuning to solve the above challenges. Extensive experiments have shown that GenTKG is a simple but effective, efficient, and generalizable approach that outperforms conventional methods on temporal relational forecasting with extremely limited computation. Our work opens a new frontier for the temporal knowledge graph domain.

MCML Authors

Ruotong Liao

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Database Systems and Data Mining

[742]

M. Singh, A. Fono and G. Kutyniok.
Are Spiking Neural Networks more expressive than Artificial Neural Networks?
UniReps @NeurIPS 2023 - 1st Workshop on Unifying Representations in Neural Models at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

This article studies the expressive power of spiking neural networks with firing-time-based information encoding, highlighting their potential for future energy-efficient AI applications when deployed on neuromorphic hardware. The computational power of a network of spiking neurons has already been studied via their capability of approximating any continuous function. By using the Spike Response Model as a mathematical model of a spiking neuron and assuming a linear response function, we delve deeper into this analysis and prove that spiking neural networks generate continuous piecewise linear mappings. We also show that they can emulate any multi-layer (ReLU) neural network with similar complexity. Furthermore, we prove that the maximum number of linear regions generated by a spiking neuron scales exponentially with respect to the input dimension, a characteristic that distinguishes it significantly from an artificial (ReLU) neuron. Our results further extend the understanding of the approximation properties of spiking neural networks and open up new avenues where spiking neural networks can be deployed instead of artificial neural networks without any performance loss.

MCML Authors

Adalbert Fono

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[741]

A. Koebler, T. Decker, M. Lebacher, I. Thon, V. Tresp and F. Buettner.
Towards Explanatory Model Monitoring.
XAIA @NeurIPS 2023 - Workshop XAI in Action: Past, Present, and Future Applications at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Monitoring machine learning systems and efficiently recovering their reliability after performance degradation are two of the most critical issues in real-world applications. However, current monitoring strategies lack the capability to provide actionable insights answering the question of why the performance of a particular model really degraded. To address this, we propose Explanatory Performance Estimation (XPE) as a novel method that facilitates more informed model monitoring and maintenance by attributing an estimated performance change to interpretable input features. We demonstrate the superiority of our approach compared to natural baselines on different data sets. We also discuss how the generated results lead to valuable insights that can reveal potential root causes for model deterioration and guide toward actionable countermeasures.

MCML Authors

Thomas Decker

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[740]

Y. Zhang, Y. Li, H. Brown, M. Rezaei, B. Bischl, P. Torr, A. Khakzar and K. Kawaguchi.
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments.
XAIA @NeurIPS 2023 - Workshop XAI in Action: Past, Present, and Future Applications at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as faithfulness, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use all and only these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by explicitly designing the neural network by manually setting its weights, along with designing data, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in AttributionLab, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.

MCML Authors

Yawei Li

Statistical Learning and Data Science

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Ashkan Khakzar

Dr.

* Former Member

[739]

M. F. Azampour, Y. Velikova, E. Fatemizadeh, S. P. Dakua and N. Navab.
Self-supervised Probe Pose Regression via Optimized Ultrasound Representations for US-CT Fusion.
MICAD 2023 - International Conference on Medical Imaging and Computer-Aided Diagnosis. Cambridge, UK, Dec 09-10, 2023. DOI GitHub

Abstract

Aligning 2D ultrasound images with 3D CT scans of the liver holds significant clinical value in enhancing diagnostic precision, surgical planning, and treatment delivery. Conventional approaches primarily rely on optimization techniques, which often have a limited capture range and are susceptible to initialization errors. To address these limitations, we define the problem as “probe pose regression” and leverage deep learning for a more robust and efficient solution for liver US-CT registration without access to paired data. The proposed method is a three-part framework that combines ultrasound rendering, generative model and pose regression. In the first stage, we exploit a differentiable ultrasound rendering model designed to synthesize ultrasound images given segmentation labels. We let the downstream task optimize the rendering parameters, enhancing the performance of the overall method. In the second stage, a generative model bridges the gap between real and rendered ultrasound images, enabling application on real B-mode images. Finally, we use a patient-specific pose regression network, trained self-supervised with only synthetic images and their known poses. We use ultrasound, and CT scans from a dual-modality human abdomen phantom to validate the proposed method. Our experimental results indicate that the proposed method can estimate probe poses within an acceptable error margin, which can later be fine-tuned using conventional methods. This capability confirms that the proposed framework can serve as a reliable initialization step for US-CT fusion and achieve fully automated US-CT fusion when coupled with conventional methods.

MCML Authors

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Aided Medical Procedures & Augmented Reality

[738]

X. Li, E. Nie and S. Liang.
Crosslingual Retrieval Augmented In-context Learning for Bangla.
BLP-2023 - 1st Workshop on Bangla Language Processing. Singapore, Dec 07, 2023. DOI

Abstract

The promise of Large Language Models (LLMs) in Natural Language Processing has often been overshadowed by their limited performance in low-resource languages such as Bangla. To address this, our paper presents a pioneering approach that utilizes cross-lingual retrieval augmented in-context learning. By strategically sourcing semantically similar prompts from high-resource language, we enable multilingual pretrained language models (MPLMs), especially the generative model BLOOMZ, to successfully boost performance on Bangla tasks. Our extensive evaluation highlights that the cross-lingual retrieval augmented prompts bring steady improvements to MPLMs over the zero-shot performance.

MCML Authors

Ercong Nie

Computational Linguistics

Sheng Liang

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

[737]

Z. Zhang, H. Yang, B. Ma, D. Rügamer and E. Nie.
Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models.
CoNLL 2023 - BabyLM Challenge at 27th Conference on Computational Natural Language Learning. Singapore, Dec 06-10, 2023. DOI GitHub

Abstract

Large Language Models (LLMs) demonstrate remarkable performance on a variety of natural language understanding (NLU) tasks, primarily due to their in-context learning ability. This ability could be applied to building babylike models, i.e. models at small scales, improving training efficiency. In this paper, we propose a ‘CoThought’ pipeline, which efficiently trains smaller ‘baby’ language models (BabyLMs) by leveraging the Chain of Thought prompting of LLMs. Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that are comparable to the school texts for language learners. The BabyLM is then pretrained on this restructured dataset in a RoBERTa fashion. In evaluations across 4 benchmarks, our BabyLM outperforms the vanilla RoBERTa in 10 linguistic, NLU, and question-answering tasks by more than 3 points, showing a superior ability to extract contextual information. These results suggest that compact LMs pretrained on small, LLM-resabructured data can better understand tasks and achieve improved performance.

MCML Authors

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

David Rügamer

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Statistics, Data Science and Machine Learning

Ercong Nie

Computational Linguistics

[736]

S. Dandl.
Causality concepts in machine learning: heterogeneous treatment effect estimation with machine learning and model interpretation with counterfactual and semi-factual explanations.
Dissertation 2023. DOI

Abstract

This thesis explores the growing intersection of machine learning and causality through seven articles, offering new insights into how these fields can enhance one another. It addresses key topics, including adapting machine learning algorithms for heterogeneous treatment effect estimation, where combining causal and model-based forest elements improves performance across diverse datasets. Additionally, the thesis introduces advanced interpretability tools, proposing methods to generate multiple counterfactual and semi-factual explanations that aid in fairness assessments and address interpretability challenges. A modular R package developed in this work provides accessible tools for researchers to apply and compare counterfactual explanation methods, further bridging machine learning and causal inference for practical applications. (Shortened).

MCML Authors

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[735]

E. Garces Arias, V. Pai, M. Schöffel, C. Heumann and M. Aßenmacher.
Automatic transcription of handwritten Old Occitan language.
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

While existing neural network-based approaches have shown promising results in Handwritten Text Recognition (HTR) for high-resource languages and standardized/machine-written text, their application to low-resource languages often presents challenges, resulting in reduced effectiveness. In this paper, we propose an innovative HTR approach that leverages the Transformer architecture for recognizing handwritten Old Occitan language. Given the limited availability of data, which comprises only word pairs of graphical variants and lemmas, we develop and rely on elaborate data augmentation techniques for both text and image data. Our model combines a custom-trained Swin image encoder with a BERT text decoder, which we pre-train using a large-scale augmented synthetic data set and fine-tune on the small human-labeled data set. Experimental results reveal that our approach surpasses the performance of current state-of-the-art models for Old Occitan HTR, including open-source Transformer-based models such as a fine-tuned TrOCR and commercial applications like Google Cloud Vision. To nurture further research and development, we make our models, data sets, and code publicly available.

MCML Authors

Esteban Garces Arias

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[734]

M. Giulianelli, J. Baan, W. Aziz, R. Fernández and B. Plank.
What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability.
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways. We characterise the extent to which human production varies lexically, syntactically, and semantically across four NLG tasks, connecting human production variability to aleatoric or data uncertainty. We then inspect the space of output strings shaped by a generation system’s predicted probability distribution and decoding algorithm to probe its uncertainty. For each test input, we measure the generator’s calibration to human production variability. Following this instance-level approach, we analyse NLG models and decoding strategies, demonstrating that probing a generator with multiple samples and, when possible, multiple references, provides the level of detail necessary to gain understanding of a model’s representation of uncertainty.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[733]

N. Kassner, O. Tafjord, A. Sabharwal, K. Richardson, H. Schütze and P. Clark.
Language Models with Rationality.
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

While large language models (LLMs) are proficient at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent ‘beliefs’. This lack of interpretability is a growing impediment to widespread use of LLMs. To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that answers are supported by interpretable chains of reasoning drawn from a consistent network of beliefs. Our approach, which we call REFLEX, is to add a rational, self-reflecting layer on top of the LLM. First, given a question, we construct a belief graph using a backward-chaining process to materialize relevant model beliefs (including beliefs about answer candidates) and their inferential relationships. Second, we identify and minimize contradictions in that graph using a formal constraint reasoner. We find that REFLEX significantly improves consistency (by 8%-11% absolute) without harming overall answer accuracy, resulting in answers supported by faithful chains of reasoning drawn from a more consistent belief system. This suggests a new style of system architecture in which an LLM extended with a rational layer can provide an interpretable window into system beliefs, add a systematic reasoning capability, and repair latent inconsistencies present in the LLM.

MCML Authors

Nora Kassner

Dr.

* Former Member

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computational Linguistics

[732]

R. Litschko, M. Müller-Eberstein, R. van der Goot, L. Weber-Genzel and B. Plank.
Establishing Trustworthiness: Rethinking Tasks and Model Evaluation.
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades. Traditionally, facets of linguistic intelligence have been compartmentalized into tasks with specialized model architectures and corresponding evaluation protocols. With the advent of large language models (LLMs) the community has witnessed a dramatic shift towards general purpose, task-agnostic approaches powered by generative models. As a consequence, the traditional compartmentalized notion of language tasks is breaking down, followed by an increasing challenge for evaluation and analysis. At the same time, LLMs are being deployed in more real-world scenarios, including previously unforeseen zero-shot setups, increasing the need for trustworthy and reliable systems. Therefore, we argue that it is time to rethink what constitutes tasks and model evaluation in NLP, and pursue a more holistic view on language, placing trustworthiness at the center. Towards this goal, we review existing compartmentalized approaches for understanding the origins of a model’s functional capacity, and provide recommendations for more multi-faceted evaluation protocols.

MCML Authors

Robert Litschko

AI and Computational Linguistics

Leon Weber-Genzel

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

* Former Member

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[731]

M. Wang, H. Adel, L. Lange, J. Strötgen and H. Schütze.
GradSim: Gradient-Based Language Grouping for Effective Multilingual Training.
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

Most languages of the world pose low-resource challenges to natural language processing models. With multilingual training, knowledge can be shared among languages. However, not all languages positively influence each other and it is an open research question how to select the most suitable set of languages for multilingual training and avoid negative interference among languages whose characteristics or data distributions are not compatible. In this paper, we propose GradSim, a language grouping method based on gradient similarity. Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains compared to other similarity measures and it is better correlated with cross-lingual model performance. As a result, we set the new state of the art on AfriSenti, a benchmark dataset for sentiment analysis on low-resource African languages. In our extensive analysis, we further reveal that besides linguistic features, the topics of the datasets play an important role for language grouping and that lower layers of transformer models encode language-specific features while higher layers capture task-specific information.

MCML Authors

Mingyang Wang

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computational Linguistics

[730]

X. Wang and B. Plank.
ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation.
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

Label aggregation such as majority voting is commonly used to resolve annotator disagreement in dataset creation. However, this may disregard minority values and opinions. Recent studies indicate that learning from individual annotations outperforms learning from aggregated labels, though they require a considerable amount of annotation. Active learning, as an annotation cost-saving strategy, has not been fully explored in the context of learning from disagreement. We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation. By designing and evaluating acquisition functions with annotator-specific heads on two datasets, we show that group-level entropy works generally well on both datasets. Importantly, it achieves performance in terms of both prediction and uncertainty estimation comparable to full-scale training from disagreement, while saving 70% of the annotation budget.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[729]

L. Weissweiler, V. Hofmann, A. Kantharuban, A. Cai, R. Dutt, A. Hengle, A. Kabra, A. Kulkarni, A. Vijayakumar, H. Yu, H. Schütze, K. Oflazer and D. Mortensen.
Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model.
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko’s (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results—through the lens of morphology—cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.

MCML Authors

Leonie Weissweiler

Dr.

* Former Member

Valentin Hofmann

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Computational Linguistics

[728]

M. Weller-Di Marco, K. Hämmerl and A. Fraser.
A Study on Accessing Linguistic Information in Pre-Trained Language Models by Using Prompts.
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

We study whether linguistic information in pre-trained multilingual language models can be accessed by human language: So far, there is no easy method to directly obtain linguistic information and gain insights into the linguistic principles encoded in such models. We use the technique of prompting and formulate linguistic tasks to test the LM’s access to explicit grammatical principles and study how effective this method is at providing access to linguistic features. Our experiments on German, Icelandic and Spanish show that some linguistic properties can in fact be accessed through prompting, whereas others are harder to capture.

MCML Authors

Katharina Hämmerl

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[727]

S. Xu, S. T.y.s.s, O. Ichim, I. Risini, B. Plank and M. Grabmair.
From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification.
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

In legal NLP, Case Outcome Classification (COC) must not only be accurate but also trustworthy and explainable. Existing work in explainable COC has been limited to annotations by a single expert. However, it is well-known that lawyers may disagree in their assessment of case facts. We hence collect a novel dataset RaVE: Rationale Variation in ECHR, which is obtained from two experts in the domain of international human rights law, for whom we observe weak agreement. We study their disagreements and build a two-level task-independent taxonomy, supplemented with COC-specific subcategories. To our knowledge, this is the first work in the legal NLP that focuses on human label variation. We quantitatively assess different taxonomy categories and find that disagreements mainly stem from underspecification of the legal context, which poses challenges given the typically limited granularity and noise in COC metadata. We further assess the explainablility of state-of-the-art COC models on RaVE and observe limited agreement between models and experts. Overall, our case study reveals hitherto underappreciated complexities in creating benchmark datasets in legal NLP that revolve around identifying aspects of a case’s facts supposedly relevant for its outcome.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[726]

A. H. Kargaran, A. Imani, F. Yvon and H. Schütze.
GlotLID: Language Identification for Low-Resource Languages.
EMNLP 2023 - Findings of the Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI GitHub

Abstract

Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages. However, there is no LID available that (i) covers a wide range of low-resource languages, (ii) is rigorously evaluated and reliable and (iii) efficient and easy to use. Here, we publish GlotLID-M, an LID model that satisfies the desiderata of wide coverage, reliability and efficiency. It identifies 1665 languages, a large increase in coverage compared to prior work. In our experiments, GlotLID-M outperforms four baselines (CLD3, FT176, OpenLID and NLLB) when balancing F1 and false positive rate (FPR). We analyze the unique challenges that low-resource LID poses: incorrect corpus metadata, leakage from high-resource languages, difficulty separating closely related languages, handling of macrolanguage vs varieties and in general noisy data. We hope that integrating GlotLID-M into dataset creation pipelines will improve quality and enhance accessibility of NLP technology for low-resource languages and cultures.

MCML Authors

Amir Hossein Kargaran

Computational Linguistics

Ayyoob Imani

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[725]

A. Köksal, T. Schick and H. Schütze.
MEAL: Stable and Active Learning for Few-Shot Prompting.
EMNLP 2023 - Findings of the Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI GitHub

Abstract

Few-shot classification has made great strides due to foundation models that, through priming and prompting, are highly effective few-shot learners. However, this approach has high variance both across different sets of few shots (data selection) and across different finetuning runs (run variability). This is problematic not only because it impedes the fair comparison of different approaches, but especially because it makes few-shot learning too unreliable for many real-world applications. To alleviate these issues, we make two contributions for more stable and effective few-shot learning: First, we propose novel ensembling methods and show that they substantially reduce run variability. Second, we introduce a new active learning (AL) criterion for data selection and present the first AL-based approach specifically tailored towards prompt-based learning. In our experiments, we show that our combined method, MEAL (Multiprompt finetuning and prediction Ensembling with Active Learning), improves overall performance of prompt-based finetuning by 2.3 points on five diverse tasks.

MCML Authors

Abdullatif Köksal

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[724]

A. Köksal, O. Yalcin, A. Akbiyik, M. T. Kilavuz, A. Korhonen and H. Schütze.
Language-Agnostic Bias Detection in Language Models with Bias Probing.
EMNLP 2023 - Findings of the Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI GitHub

Abstract

Pretrained language models (PLMs) are key components in NLP, but they contain strong social biases. Quantifying these biases is challenging because current methods focusing on fill-the-mask objectives are sensitive to slight changes in input. To address this, we propose a bias probing technique called LABDet, for evaluating social bias in PLMs with a robust and language-agnostic method. For nationality as a case study, we show that LABDet “surfaces” nationality bias by training a classifier on top of a frozen PLM on non-nationality sentiment detection. We find consistent patterns of nationality bias across monolingual PLMs in six languages that align with historical and political context. We also show for English BERT that bias surfaced by LABDet correlates well with bias in the pretraining data; thus, our work is one of the few studies that directly links pretraining data to PLM behavior. Finally, we verify LABDet’s reliability and applicability to different templates and languages through an extensive set of robustness checks.

MCML Authors

Abdullatif Köksal

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Computational Linguistics

[723]

W. Lai, A. Chronopoulou and A. Fraser.
Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation.
EMNLP 2023 - Findings of the Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

Despite advances in multilingual neural machine translation (MNMT), we argue that there are still two major challenges in this area: data imbalance and representation degeneration. The data imbalance problem refers to the imbalance in the amount of parallel corpora for all language pairs, especially for long-tail languages (i.e., very low-resource languages). The representation degeneration problem refers to the problem of encoded tokens tending to appear only in a small subspace of the full space available to the MNMT model. To solve these two issues, we propose Bi-ACL, a framework which only requires target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model. We define two modules, named bidirectional autoencoder and bidirectional contrastive learning, which we combine with an online constrained beam search and a curriculum learning sampling strategy. Extensive experiments show that our proposed method is more effective than strong baselines both in long-tail languages and in high-resource languages. We also demonstrate that our approach is capable of transferring knowledge between domains and languages in zero-shot scenarios.

MCML Authors

Wen Lai

Data Analytics & Statistics

Alexandra Chronopoulou

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

* Former Member

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Data Analytics & Statistics

[722]

Y. Liu, H. Ye, L. Weissweiler, R. Pei and H. Schütze.
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs.
EMNLP 2023 - Findings of the Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

In comparative linguistics, colexification refers to the phenomenon of a lexical form conveying two or more distinct meanings. Existing work on colexification patterns relies on annotated word lists, limiting scalability and usefulness in NLP. In contrast, we identify colexification patterns of more than 2,000 concepts across 1,335 languages directly from an unannotated parallel corpus. We then propose simple and effective methods to build multilingual graphs from the colexification patterns: ColexNet and ColexNet+. ColexNet’s nodes are concepts and its edges are colexifications. In ColexNet+, concept nodes are additionally linked through intermediate nodes, each representing an ngram in one of 1,334 languages. We use ColexNet+ to train ColexNet+, high-quality multilingual embeddings that are well-suited for transfer learning. In our experiments, we first show that ColexNet achieves high recall on CLICS, a dataset of crosslingual colexifications. We then evaluate ColexNet+ on roundtrip translation, sentence retrieval and sentence classification and show that our embeddings surpass several transfer learning baselines. This demonstrates the benefits of using colexification as a source of information in multilingual NLP.

MCML Authors

Yihong Liu

Computational Linguistics

Haotian Ye

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Leonie Weissweiler

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[721]

M. Müller-Eberstein, R. van der Goot, B. Plank and I. Titov.
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training.
EMNLP 2023 - Findings of the Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

Representational spaces learned via language modeling are fundamental to Natural Language Processing (NLP), however there has been limited understanding regarding how and when during training various types of linguistic information emerge and interact. Leveraging a novel information theoretic probing suite, which enables direct comparisons of not just task performance, but their representational subspaces, we analyze nine tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds. We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize. Across these phases, syntactic knowledge is acquired rapidly after 0.5% of full training. Continued performance improvements primarily stem from the acquisition of open-domain knowledge, while semantics and reasoning tasks benefit from later boosts to long-range contextualization and higher specialization. Measuring cross-task similarity further reveals that linguistically related tasks share information throughout training, and do so more during the critical phase of learning than before or after. Our findings have implications for model interpretability, multi-task learning, and learning from limited data.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[720]

E. Nie, H. Schmid and H. Schütze.
Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration.
EMNLP 2023 - Findings of the Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

Pretrained multilingual encoder models can directly perform zero-shot multilingual tasks or linguistic probing by reformulating the input examples into cloze-style prompts. This is accomplished by predicting the probabilities of the label words at the masked token position, without requiring any updates to the model parameters. However, the performance of this method is limited by the model’s bias toward predicting label words which frequently occurred during the pretraining. These words typically receive high probabilities. To address this issue, we combine the models with calibration techniques which modify the probabilities of label words predicted by the models. We first validate the effectiveness of a proposed simple calibration method together with other existing techniques on monolingual encoders in both zero- and few-shot scenarios. We subsequently employ these calibration techniques on multilingual encoders, resulting in substantial performance improvements across a wide range of tasks.

MCML Authors

Ercong Nie

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Computational Linguistics

[719]

V. Hangya, S. Severini, R. Ralev, A. Fraser and H. Schütze.
Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages.
MRL @EMNLP 2023 - 3rd Workshop on Multi-lingual Representation Learning at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI

Abstract

Very low-resource languages, having only a few million tokens worth of data, are not well-supported by multilingual NLP approaches due to poor quality cross-lingual word representations. Recent work showed that good crosslingual performance can be achieved if a source language is related to the low-resource target language. However, not all language pairs are related. In this paper, we propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach, that incorporates intermediate related languages to bridge the gap between the distant source and target. We build MWEs one language at a time by starting from the resource rich source and sequentially adding each language in the chain till we reach the target. We extend a semi-joint bilingual approach to multiple languages in order to eliminate the main weakness of previous works, i.e., independently trained monolingual embeddings, by anchoring the target language around the multilingual space. We evaluate our method on bilingual lexicon induction for 4 language families, involving 4 very low-resource (≤ 5M tokens) and 4 moderately low-resource (≤ 50M) target languages, showing improved performance in both categories. Additionally, our analysis reveals the importance of good quality embeddings for intermediate languages as well as the importance of leveraging anchor points from all languages in the multilingual space.

MCML Authors

Viktor Hangya

Dr.

* Former Member

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[718]

J. Herbinger.
On grouping and partitioning approaches in interpretable machine learning.
Dissertation 2023. DOI

Abstract

This thesis addresses the challenges of interpreting machine learning models, particularly focusing on the limitations of global explanation methods. It identifies two key issues: the human-incomprehensibility of high-dimensional outputs and the misleading interpretations caused by aggregation bias. The thesis proposes solutions to these problems, such as grouping features for simpler interpretations and using recursive partitioning algorithms to provide regional explanations, ensuring more accurate and understandable insights into model behavior. (Shortened.)

MCML Authors

Julia Herbinger

Dr.

* Former Member

[717]

L. Haliburton, B. Rossmy, A. Schmidt and C. George.
An Exploration of Hidden Data: Identifying and Physicalizing Personal Virtual Data to Extend Co-located Communication.
MUM 2023 - 22nd International Conference on Mobile and Ubiquitous Multimedia. Vienna, Austria, Dec 03-06, 2023. DOI

Abstract

Communication is crucial for interpersonal connection, but sometimes we simply cannot find the right words. Some data, such as complex emotions, are either hard to quantify or are otherwise difficult to communicate. We have access to numerous personal statistics from quantified self devices, but hidden data are either untracked or require abstraction. In this paper, we explore physicalizations to communicate hidden data between couples. We recruited six couples (N=12 participants, 163 telegram responses) to participate in a two-week sensitization diary study followed by two participatory co-design sessions. We then hosted a one-day expert prototyping workshop (N=5) to create tangible artifacts based on the findings of the participatory phase. By iterating on the topic in three ways, we contribute (i) a design framework for understanding and tangibly representing hidden data, (ii) a discussion on the appropriateness of these methodologies, and (iii) open research questions to guide future research in the field.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Human-Centered Ubiquitous Media

[716]

F. Karl, T. Pielok, J. Moosbauer, F. Pfisterer, S. Coors, M. Binder, L. Schneider, J. Thomas, J. Richter, M. Lang, E. C. Garrido-Merchán, J. Branke and B. Bischl.
Multi-Objective Hyperparameter Optimization in Machine Learning—An Overview.
ACM Transactions on Evolutionary Learning and Optimization 3.4 (Dec. 2023). DOI

Abstract

Hyperparameter optimization constitutes a large part of typical modern machine learning (ML) workflows. This arises from the fact that ML methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metrics or constraints must be considered when determining an optimal configuration, resulting in a multi-objective optimization problem. This is often neglected in practice, due to a lack of knowledge and readily available software implementations for multi-objective hyperparameter optimization. In this work, we introduce the reader to the basics of multi-objective hyperparameter optimization and motivate its usefulness in applied ML. Furthermore, we provide an extensive survey of existing optimization strategies from the domains of evolutionary algorithms and Bayesian optimization. We illustrate the utility of multi-objective optimization in several specific ML applications, considering objectives such as operating conditions, prediction time, sparseness, fairness, interpretability, and robustness.

MCML Authors

Florian Karl

Statistical Learning and Data Science

Tobias Pielok

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Stefan Coors

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Statistical Learning and Data Science

[715]

D. Geißler, D. Bär, N. Pröllochs and S. Feuerriegel.
Russian propaganda on social media during the 2022 invasion of Ukraine.
EPJ Data Science (Dec. 2023). DOI

Abstract

The Russian invasion of Ukraine in February 2022 was accompanied by practices of information warfare, yet existing evidence is largely anecdotal while large-scale empirical evidence is lacking. Here, we analyze the spread of pro-Russian support on social media. For this, we collected messages from Twitter with pro-Russian support. Our findings suggest that pro-Russian messages received ∼251,000 retweets and thereby reached around 14.4 million users. We further provide evidence that bots played a disproportionate role in the dissemination of pro-Russian messages and amplified its proliferation in early-stage diffusion. Countries that abstained from voting on the United Nations Resolution ES-11/1 such as India, South Africa, and Pakistan showed pronounced activity of bots. Overall, 20.28% of the spreaders are classified as bots, most of which were created at the beginning of the invasion. Together, our findings suggest the presence of a large-scale Russian propaganda campaign on social media and highlight the new threats to society that originate from it. Our results also suggest that curbing bots may be an effective strategy to mitigate such campaigns.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[714]

J. Rausch, G. Rashiti, M. Gusev, C. Zhang and S. Feuerriegel.
DSG: An End-to-End Document Structure Generator.
ICDM 2023 - 23rd IEEE International Conference on Data Mining. Shanghai, China, Dec 01-04, 2023. DOI

Abstract

Information in industry, research, and the public sector is widely stored as rendered documents (e.g., PDF files, scans). Hence, to enable downstream tasks, systems are needed that map rendered documents onto a structured hierarchical format. However, existing systems for this task are limited by heuristics and are not end-to-end trainable. In this work, we introduce the Document Structure Generator (DSG), a novel system for document parsing that is fully end-to-end trainable. DSG combines a deep neural network for parsing (i) entities in documents (e.g., figures, text blocks, headers, etc.) and (ii) relations that capture the sequence and nested structure between entities. Unlike existing systems that rely on heuristics, our DSG is trained end-to-end, making it effective and flexible for real-world applications. We further contribute a new, large-scale dataset called E-Periodica comprising real-world magazines with complex document structures for evaluation. Our results demonstrate that our DSG outperforms commercial OCR tools and, on top of that, achieves state-of-the-art performance. To the best of our knowledge, our DSG system is the first end-to-end trainable system for hierarchical document parsing.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[713]

C. Leiber, L. Miklautz, C. Plant and C. Böhm.
Benchmarking Deep Clustering Algorithms With ClustPy.
ICDMW 2023 - IEEE International Conference on Data Mining Workshops. Shanghai, China, Dec 01-04, 2023. DOI GitHub

Abstract

Deep clustering algorithms have gained popularity as they are able to cluster complex large-scale data, like images. Yet these powerful algorithms require many decisions w.r.t. architecture, learning rate and other hyperparameters, making it difficult to compare different methods. A comprehensive empirical evaluation of novel clustering methods, however, plays an important role in both scientific and practical applications, as it reveals their individual strengths and weaknesses. Therefore, we introduce ClustPy, a unified framework for benchmarking deep clustering algorithms, and perform a comparison of several fundamental deep clustering methods and some recently introduced ones. We compare these methods on multiple well known image data sets using different evaluation metrics, perform a sensitivity analysis w.r.t. important hyperparameters and perform ablation studies, e.g., for different autoencoder architectures and image augmentation. To our knowledge this is the first in depth benchmarking of deep clustering algorithms in a unified setting.

MCML Authors

Collin Leiber

* Former Member

Christian Böhm

Prof. Dr.

* Former Member

[712]

C. Koller, G. Kauermann and X. Zhu.
Going Beyond One-Hot Encoding in Classification: Can Human Uncertainty Improve Model Performance in Earth Observation?
IEEE Transactions on Geoscience and Remote Sensing 62 (Dec. 2023). DOI GitHub

Abstract

Technological and computational advances continuously drive forward the field of deep learning in remote sensing. In recent years, the derivation of quantities describing the uncertainty in the prediction—which naturally accompanies the modeling process—has sparked interest in the remote sensing community. Often neglected in the machine learning setting is the human uncertainty that influences numerous labeling processes. As the core of this work, the task of local climate zone (LCZ) classification is studied by means of a dataset that contains multiple label votes by domain experts for each image. The inherent label uncertainty describes the ambiguity among the domain experts and is explicitly embedded into the training process via distributional labels. We show that incorporating the label uncertainty helps the model to generalize better to the test data and increases model performance. Similar to existing calibration methods, the distributional labels lead to better-calibrated probabilities, which in turn yield more certain and trustworthy predictions.

MCML Authors

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[711]

F. Xu, Y. Shi, P. Ebel, W. Yang and X. Zhu.
Multimodal and Multiresolution Data Fusion for High-Resolution Cloud Removal: A Novel Baseline and Benchmark.
IEEE Transactions on Geoscience and Remote Sensing 62 (Dec. 2023). DOI GitHub

Abstract

Cloud removal (CR) is a significant and challenging problem in remote sensing, and in recent years, there have been notable advancements in this area. However, two major issues remain hindering the development of CR: the unavailability of high-resolution imagery for existing datasets and the absence of evaluation regarding the semantic meaningfulness of the generated structures. In this article, we introduce M3R-CR, a benchmark dataset for high-resolution CR with multimodal and multiresolution data fusion. M3R-CR is the first public dataset for CR to feature globally sampled high-resolution optical observations, paired with radar measurements and pixel-level land-cover annotations. With this dataset, we consider the problem of CR in high-resolution optical remote-sensing imagery by integrating multimodal and multiresolution information. In this context, we have to take into account the alignment errors caused by the multiresolution nature, along with the more pronounced misalignment issues in high-resolution images due to inherent imaging mechanism differences and other factors. Existing multimodal data fusion-based methods, which assume the image pairs are aligned accurately at the pixel level, are thus not appropriate for this problem. To this end, we design a new baseline named Align-CR to perform the low-resolution synthetic aperture radar (SAR) image-guided high-resolution optical image CR. It gradually warps and fuses the features of the multimodal and multiresolution data during the reconstruction process, effectively mitigating concerns associated with misalignment. In the experiments, we evaluate the performance of CR by analyzing the quality of visually pleasing textures using image reconstruction (IR) metrics and further analyze the generation of semantically meaningful structures using a well-established semantic segmentation task. The proposed Align-CR method is superior to other baseline methods in both areas.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Data Science in Earth Observation

[710]

H. Boche, A. Fono and G. Kutyniok.
Limitations of Deep Learning for Inverse Problems on Digital Hardware.
IEEE Transactions on Information Theory 69.12 (Dec. 2023). DOI

Abstract

Deep neural networks have seen tremendous success over the last years. Since the training is performed on digital hardware, in this paper, we analyze what actually can be computed on current hardware platforms modeled as Turing machines, which would lead to inherent restrictions of deep learning. For this, we focus on the class of inverse problems, which, in particular, encompasses any task to reconstruct data from measurements. We prove that finite-dimensional inverse problems are not Banach-Mazur computable for small relaxation parameters. Even more, our results introduce a lower bound on the accuracy that can be obtained algorithmically.

MCML Authors

Adalbert Fono

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[709]

F. Fan, Y. Shi, T. Guggemos and X. Zhu.
Hybrid Quantum-Classical Convolutional Neural Network Model for Image Classification.
IEEE Transactions on Neural Networks and Learning Systems 35.12 (Dec. 2023). DOI URL

Abstract

Image classification plays an important role in remote sensing. Earth observation (EO) has inevitably arrived in the big data era, but the high requirement on computation power has already become a bottleneck for analyzing large amounts of remote sensing data with sophisticated machine learning models. Exploiting quantum computing might contribute to a solution to tackle this challenge by leveraging quantum properties. This article introduces a hybrid quantum-classical convolutional neural network (QC-CNN) that applies quantum computing to effectively extract high-level critical features from EO data for classification purposes. Besides that, the adoption of the amplitude encoding technique reduces the required quantum bit resources. The complexity analysis indicates that the proposed model can accelerate the convolutional operation in comparison with its classical counterpart. The model’s performance is evaluated with different EO benchmarks, including Overhead-MNIST, So2Sat LCZ42, PatternNet, RSI-CB256, and NaSC-TG2, through the TensorFlow Quantum platform, and it can achieve better performance than its classical counterpart and have higher generalizability, which verifies the validity of the QC-CNN model on EO data classification tasks.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[708]

Ç. Yapar, R. Levie, G. Kutyniok and G. Caire.
Real-Time Outdoor Localization Using Radio Maps: A Deep Learning Approach.
IEEE Transactions on Wireless Communications 22.12 (Dec. 2023). DOI

Abstract

Global Navigation Satellite Systems typically perform poorly in urban environments, where the likelihood of line-of-sight conditions between devices and satellites is low. Therefore, alternative location methods are required to achieve good accuracy. We present LocUNet: A convolutional, end-to-end trained neural network (NN) for the localization task, which is able to estimate the position of a user from the received signal strength (RSS) of a small number of Base Stations (BS). Using estimations of pathloss radio maps of the BSs and the RSS measurements of the users to be localized, LocUNet can localize users with state-of-the-art accuracy and enjoys high robustness to inaccuracies in the estimations of radio maps. The proposed method does not require generating RSS fingerprints of each specific area where the localization task is performed and is suitable for real-time applications. Moreover, two novel datasets that allow for numerical evaluations of RSS and ToA methods in realistic urban environments are presented and made publicly available for the research community. By using these datasets, we also provide a fair comparison of state-of-the-art RSS and ToA-based methods in the dense urban scenario and show numerically that LocUNet outperforms all the compared methods.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[707]

A. T. Stüber, S. Coors, B. Schachtner, T. Weber, D. Rügamer, A. Bender, A. Mittermeier, O. Öcal, M. Seidensticker, J. Ricke, B. Bischl and M. Ingrisch.
A comprehensive machine learning benchmark study for radiomics-based survival analysis of CT imaging data in patients with hepatic metastases of CRC.
Investigative Radiology 58.12 (Dec. 2023). DOI

Abstract

Optimizing a machine learning (ML) pipeline for radiomics analysis involves numerous choices in data set composition, preprocessing, and model selection. Objective identification of the optimal setup is complicated by correlated features, interdependency structures, and a multitude of available ML algorithms. Therefore, we present a radiomics-based benchmarking framework to optimize a comprehensive ML pipeline for the prediction of overall survival. This study is conducted on an image set of patients with hepatic metastases of colorectal cancer, for which radiomics features of the whole liver and of metastases from computed tomography images were calculated. A mixed model approach was used to find the optimal pipeline configuration and to identify the added prognostic value of radiomics features.

MCML Authors

Theresa Stüber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Stefan Coors

* Former Member

Balthasar Schachtner

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Tobias Weber

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Michael Ingrisch

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Clinical Data Science in Radiology

[706]

D. Strieder and M. Drton.
Confidence in causal inference under structure uncertainty in linear causal models with equal variances.
Journal of Causal Inference 11.1 (Dec. 2023). DOI

Abstract

Inferring the effect of interventions within complex systems is a fundamental problem of statistics. A widely studied approach uses structural causal models that postulate noisy functional relations among a set of interacting variables. The underlying causal structure is then naturally represented by a directed graph whose edges indicate direct causal dependencies. In a recent line of work, additional assumptions on the causal models have been shown to render this causal graph identifiable from observational data alone. One example is the assumption of linear causal relations with equal error variances that we will take up in this work. When the graph structure is known, classical methods may be used for calculating estimates and confidence intervals for causal-effects. However, in many applications, expert knowledge that provides an a priori valid causal structure is not available. Lacking alternatives, a commonly used two-step approach first learns a graph and then treats the graph as known in inference. This, however, yields confidence intervals that are overly optimistic and fail to account for the data-driven model choice. We argue that to draw reliable conclusions, it is necessary to incorporate the remaining uncertainty about the underlying causal structure in confidence statements about causal-effects. To address this issue, we present a framework based on test inversion that allows us to give confidence regions for total causal-effects that capture both sources of uncertainty: causal structure and numerical size of non-zero effects.

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

Mathematical Statistics

[705]

F. Brechtmann, T. Bechtler, S. Londhe, C. Mertes and J. Gagneur.
Evaluation of input data modality choices on functional gene embeddings.
NAR Genomics and Bioinformatics 5.4 (Dec. 2023). DOI

Abstract

Functional gene embeddings, numerical vectors capturing gene function, provide a promising way to integrate functional gene information into machine learning models. These embeddings are learnt by applying self-supervised machine-learning algorithms on various data types including quantitative omics measurements, protein–protein interaction networks and literature. However, downstream evaluations comparing alternative data modalities used to construct functional gene embeddings have been lacking. Here we benchmarked functional gene embeddings obtained from various data modalities for predicting disease-gene lists, cancer drivers, phenotype–gene associations and scores from genome-wide association studies. Off-the-shelf predictors trained on precomputed embeddings matched or outperformed dedicated state-of-the-art predictors, demonstrating their high utility. Embeddings based on literature and protein–protein interactions inferred from low-throughput experiments outperformed embeddings derived from genome-wide experimental data (transcriptomics, deletion screens and protein sequence) when predicting curated gene lists. In contrast, they did not perform better when predicting genome-wide association signals and were biased towards highly-studied genes. These results indicate that embeddings derived from literature and low-throughput experiments appear favourable in many existing benchmarks because they are biased towards well-studied genes and should therefore be considered with caution. Altogether, our study and precomputed embeddings will facilitate the development of machine-learning models in genetics and related fields.

MCML Authors

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

[704]

T. Kaufmann, P. Weng, V. Bengs and E. Hüllermeier.
A Survey of Reinforcement Learning from Human Feedback.
Preprint (Dec. 2023). arXiv

Abstract

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of large language models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in directing the model’s capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[703]

S. Kolek, A. Chattopadhyay, K. H. R. Chan, H. Andrade-Loarca, G. Kutyniok and R. Vidal.
Learning Interpretable Queries for Explainable Image Classification with Information Pursuit.
Preprint (Dec. 2023). arXiv

Abstract

Information Pursuit (IP) is an explainable prediction algorithm that greedily selects a sequence of interpretable queries about the data in order of information gain, updating its posterior at each step based on observed query-answer pairs. The standard paradigm uses hand-crafted dictionaries of potential data queries curated by a domain expert or a large language model after a human prompt. However, in practice, hand-crafted dictionaries are limited by the expertise of the curator and the heuristics of prompt engineering. This paper introduces a novel approach: learning a dictionary of interpretable queries directly from the dataset. Our query dictionary learning problem is formulated as an optimization problem by augmenting IP’s variational formulation with learnable dictionary parameters. To formulate learnable and interpretable queries, we leverage the latent space of large vision and language models like CLIP. To solve the optimization problem, we propose a new query dictionary learning algorithm inspired by classical sparse dictionary learning. Our experiments demonstrate that learned dictionaries significantly outperform hand-crafted dictionaries generated with large language models.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[702]

Y. Sale, P. Hofman, L. Wimmer, E. Hüllermeier and T. Nagler.
Second-Order Uncertainty Quantification: Variance-Based Measures.
Preprint (Dec. 2023). arXiv

Abstract

Uncertainty quantification is a critical aspect of machine learning models, providing important insights into the reliability of predictions and aiding the decision-making process in real-world applications. This paper proposes a novel way to use variance-based measures to quantify uncertainty on the basis of second-order distributions in classification problems. A distinctive feature of the measures is the ability to reason about uncertainties on a class-based level, which is useful in situations where nuanced decision-making is required. Recalling some properties from the literature, we highlight that the variance-based measures satisfy important (axiomatic) properties. In addition to this axiomatic approach, we present empirical results showing the measures to be effective and competitive to commonly used entropy-based measures.

MCML Authors

Paul Hofman

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Lisa Wimmer

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Statistics & Data Science

[701]

C. A. Scholbeck, J. Moosbauer, G. Casalicchio, H. Gupta, B. Bischl and C. Heumann.
Position Paper: Bridging the Gap Between Machine Learning and Sensitivity Analysis.
Preprint (Dec. 2023). arXiv

Abstract

We argue that interpretations of machine learning (ML) models or the model-building process can be seen as a form of sensitivity analysis (SA), a general methodology used to explain complex systems in many fields such as environmental modeling, engineering, or economics. We address both researchers and practitioners, calling attention to the benefits of a unified SA-based view of explanations in ML and the necessity to fully credit related work. We bridge the gap between both fields by formally describing how (a) the ML process is a system suitable for SA, (b) how existing ML interpretation methods relate to this perspective, and (c) how other SA techniques could be applied to ML.

MCML Authors

Julia Moosbauer

Dr.

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[700]

E. Thelisson, G. Mika, Q. Schneiter, K. Padh and H. Verma.
Toward Responsible AI Use: Considerations for Sustainability Impact Assessment.
Preprint (Dec. 2023). arXiv

Abstract

As AI/ML models, including Large Language Models, continue to scale with massive datasets, so does their consumption of undeniably limited natural resources, and impact on society. In this collaboration between AI, Sustainability, HCI and legal researchers, we aim to enable a transition to sustainable AI development by enabling stakeholders across the AI value chain to assess and quantitfy the environmental and societal impact of AI. We present the ESG Digital and Green Index (DGI), which offers a dashboard for assessing a company’s performance in achieving sustainability targets. This includes monitoring the efficiency and sustainable use of limited natural resources related to AI technologies (water, electricity, etc). It also addresses the societal and governance challenges related to AI. The DGI creates incentives for companies to align their pathway with the Sustainable Development Goals (SDGs). The value, challenges and limitations of our methodology and findings are discussed in the paper.

MCML Authors

Kirtan Padh

Ethics in Systems Design and Machine Learning

[699]

G. Zhang, J. Bi, J. Gu, Y. Chen and V. Tresp.
SPOT! Revisiting Video-Language Models for Event Understanding.
Preprint (Dec. 2023). arXiv

Abstract

Understanding videos is an important research topic for multimodal learning. Leveraging large-scale datasets of web-crawled video-text pairs as weak supervision has become a pre-training paradigm for learning joint representations and showcased remarkable potential in video understanding tasks. However, videos can be multi-event and multi-grained, while these video-text pairs usually contain only broad-level video captions. This raises a question: with such weak supervision, can video representation in video-language models gain the ability to distinguish even factual discrepancies in textual description and understand fine-grained events? To address this, we introduce SPOT Prober, to benchmark existing video-language models’s capacities of distinguishing event-level discrepancies as an indicator of models’ event understanding ability. Our approach involves extracting events as tuples (<Subject, Predicate, Object, Attribute, Timestamps>) from videos and generating false event tuples by manipulating tuple components systematically. We reevaluate the existing video-language models with these positive and negative captions and find they fail to distinguish most of the manipulated events. Based on our findings, we propose to plug in these manipulated event captions as hard negative samples and find them effective in enhancing models for event understanding.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[698]

Z. Ding, Z. Li, R. Qi, J. Wu, B. He, Y. Ma, Z. Meng, S. Chen, R. Liao, Z. Han and V. Tresp.
FORECASTTKGQUESTIONS: A Benchmark for Temporal Question Answering and Forecasting over Temporal Knowledge Graphs.
ISWC 2023 - 22nd International Semantic Web Conference. Athens, Greeke, Nov 06-11, 2023. DOI

Abstract

Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. Previous related works aim to develop QA systems that answer temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning this period can be fully used for inference. In real-world scenarios, however, it is common that given knowledge until the current instance, we wish the TKGQA systems to answer the questions asking about future. As humans constantly plan the future, building forecasting TKGQA systems is important. In this paper, we propose a novel task: forecasting TKGQA, and propose a coupled large-scale TKGQA benchmark dataset, i.e., FORECASTTKGQUESTIONS. It includes three types of forecasting questions, i.e., entity prediction, yes-unknown, and fact reasoning questions. For every question, a timestamp is annotated and QA models only have access to TKG information prior to it for answer inference. We find that previous TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-unknown and fact reasoning questions. To this end, we propose FORECASTTKGQA, a TKGQA model that employs a TKG forecasting module for future inference. Experiments show that it performs well in forecasting TKGQA.

MCML Authors

Zifeng Ding

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining

Shuo Chen

Database Systems and Data Mining

Ruotong Liao

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[697]

D. Rügamer, F. Pfisterer, B. Bischl and B. Grün.
Mixture of Experts Distributional Regression: Implementation Using Robust Estimation with Adaptive First-order Methods.
Advances in Statistical Analysis (Nov. 2023). DOI

Abstract

In this work, we propose an efficient implementation of mixtures of experts distributional regression models which exploits robust estimation by using stochastic first-order optimization techniques with adaptive learning rate schedulers. We take advantage of the flexibility and scalability of neural network software and implement the proposed framework in mixdistreg, an R software package that allows for the definition of mixtures of many different families, estimation in high-dimensional and large sample size settings and robust optimization based on TensorFlow. Numerical experiments with simulated and real-world data applications show that optimization is as reliable as estimation via classical approaches in many different settings and that results may be obtained for complicated scenarios where classical approaches consistently fail.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[696]

J. Maly.
Robust sensing of low-rank matrices with non-orthogonal sparse decomposition.
Applied and Computational Harmonic Analysis 67 (Nov. 2023). 2024 ACHA Charles Chui Young Researcher Best Paper Award. DOI

Abstract

We consider the problem of recovering an unknown low-rank matrix with (possibly) non-orthogonal, effectively sparse rank-1 decomposition from measurements y gathered in a linear measurement process . We propose a variational formulation that lends itself to alternating minimization and whose global minimizers provably approximate up to noise level. Working with a variant of robust injectivity, we derive reconstruction guarantees for various choices of including sub-gaussian, Gaussian rank-1, and heavy-tailed measurements. Numerical experiments support the validity of our theoretical considerations.

MCML Authors

Johannes Maly

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Mathematical Data Science and Artificial Intelligence

[695]

A. Beyerlein, M. Weigert, K. Katz, H. Küchenhoff and W. Hartl.
Langzeitveränderungen des Impfschutzes vor schweren COVID-19-Verläufen.
Deutsches Ärzteblatt 120.51-52 (Nov. 2023). DOI

Abstract

Hintergrund: Für Deutschland ist der Langzeitverlauf des Schutzes durch eine Impfstoff-induzierte oder hybride Immunität vor schweren COVID-19-Verläufen unklar. Methode: Wir untersuchten 146.457 geimpfte und zwischen Februar 2022 und Januar 2023 positiv auf SARS-CoV-2- getestete Personen im Alter von 60 bis 99 Jahren aus Bayern. Berechnet wurden adjustierte Hazard Ratios (aHR) für einen schweren Verlauf (COVID-19-bedingte Hospitalisierung oder Tod) in Abhängigkeit vom zeitlichen Abstand zwischen dem Eintritt einer vollständigen oder geboosterten Immunität und dem Infektionsdatum. Ergebnisse: Es wurden 3.342 (2,3%) schwere COVID-19-Verläufe innerhalb der ersten 60 Tage nach der Infektion beobachtet. Das Risiko eines schweren Verlaufs stieg mit zunehmendem Abstand zwischen dem Eintritt des Immunschutzes und der Infektion schrittweise an (aHR [95-%-Konfidenzintervall] nach 6, 9, 12 beziehungsweise 15 Monaten: 1,14 [1,08; 1,20]; 1,33 [1,24; 1,42]; 1,39 [1,25; 1,54]; 1,61 [1,35; 1,93]). Das Risiko stieg langsamer an, wenn ausschließlich mRNA-basierte Impfstoffe zur Anwendung gekommen waren. Wir haben in einer Vorgängerstudie eine anfängliche Wirksamkeit von 82% bei geboosterten (verglichen mit ungeimpften) Fälle ≥ 60 Jahre und eine absolute Risikoreduktion von 2,1% beobachtet. Überträgt man diese Ergebnisse auf unsere aktuelle Studie, so beträgt die verbleibende Wirksamkeit beziehungsweise die absolute Risikoreduktion nach sechs Monaten etwa 71% beziehungsweise 1,8% und nach 15 Monaten 32% beziehungsweise 0,8%. Schlussfolgerung: Diese Ergebnisse deuten darauf hin, dass während der Omikron-Welle der Schutz vor einem schweren COVID-19-Verlauf bei älteren Personen ab dem sechsten Monat nach Impfung graduell nachließ. Limitierungen sind nicht berücksichtigte Störfaktoren, eine mögliche Fehlklassifikation der Todesursache sowie ein Selektionsbias aufgrund fehlender Informationen über Impfstatus und schwere COVID-19-Verläufe.

MCML Authors

Maximilian Weigert

* Former Member

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

[694]

A. Beyerlein, M. Weigert, K. Katz, H. Küchenhoff and W. Hartl.
Long-Term Trends in the Protection Against Severe Courses of COVID-19 by Vaccination.
Deutsches Ärzteblatt 120.51-52 (Nov. 2023). DOI

Abstract

MCML Authors

Maximilian Weigert

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[693]

B. H. Lang, S. Nyholm and J. Blumenthal-Barby.
Responsibility Gaps and Black Box Healthcare Ai: Shared Responsibilization as a Solution.
Digital Society 2.52 (Nov. 2023). DOI

Abstract

As sophisticated artificial intelligence software becomes more ubiquitously and more intimately integrated within domains of traditionally human endeavor, many are raising questions over how responsibility (be it moral, legal, or causal) can be understood for an AI’s actions or influence on an outcome. So called ‘responsibility gaps’ occur whenever there exists an apparent chasm in the ordinary attribution of moral blame or responsibility when an AI automates physical or cognitive labor otherwise performed by human beings and commits an error. Healthcare administration is an industry ripe for responsibility gaps produced by these kinds of AI. The moral stakes of healthcare are often life and death, and the demand for reducing clinical uncertainty while standardizing care incentivizes the development and integration of AI diagnosticians and prognosticators. In this paper, we argue that (1) responsibility gaps are generated by ‘black box’ healthcare AI, (2) the presence of responsibility gaps (if unaddressed) creates serious moral problems, (3) a suitable solution is for relevant stakeholders to voluntarily responsibilize the gaps, taking on some moral responsibility for things they are not, strictly speaking, blameworthy for, and (4) should this solution be taken, black box healthcare AI will be permissible in the provision of healthcare.

MCML Authors

Sven Nyholm

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Ethics of Artificial Intelligence

[692]

L. Bothmann, L. Wimmer, O. Charrakh, T. Weber, H. Edelhoff, W. Peters, H. Nguyen, C. Benjamin and A. Menzel.
Automated wildlife image classification: An active learning tool for ecological applications.
Ecological Informatics 77 (Nov. 2023). DOI

Abstract

Wildlife camera trap images are being used extensively to investigate animal abundance, habitat associations, and behavior, which is complicated by the fact that experts must first classify the images to retrieve relevant information. Artificial intelligence systems can take over this task but usually need a large number of already-labeled training images to achieve sufficient performance. This requirement necessitates human expert labor and poses a particular challenge for projects with few cameras or short durations. We propose a label-efficient learning strategy that enables researchers with small or medium-sized image databases to leverage the potential of modern machine learning, thus freeing crucial resources for subsequent analyses. Our methodological proposal is twofold: On the one hand, we improve current strategies of combining object detection and image classification by tuning the hyperparameters of both models. On the other hand, we provide an active learning system that allows training deep learning models very efficiently in terms of required manually labeled training images. We supply a software package that enables researchers to use these methods without specific programming skills and thereby ensure the broad applicability of the proposed framework in ecological practice. We show that our tuning strategy improves predictive performance, emphasizing that tuning can and must be done separately for a new data set. We demonstrate how the active learning pipeline reduces the amount of pre-labeled data needed to achieve specific predictive performance and that it is especially valuable for improving out-of-sample predictive performance. We conclude that the combination of tuning and active learning increases the predictive performance of automated image classifiers substantially. Furthermore, we argue that our work can broadly impact the community through the ready-to-use software package provided. Finally, the publication of our models tailored to European wildlife data enriches existing model bases mostly trained on data from Africa and North America.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Lisa Wimmer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[691]

C. Wachinger, T. N. Wolf and S. Pölsterl.
Deep learning for the prediction of type 2 diabetes mellitus from neck-to-knee Dixon MRI in the UK biobank.
Heliyon 9.11 (Nov. 2023). DOI

Abstract

Rationale and objectives: We evaluate the automatic identification of type 2 diabetes from neck-to-knee, two-point Dixon MRI scans with 3D convolutional neural networks on a large, population-based dataset. To this end, we assess the best combination of MRI contrasts and stations for diabetes prediction, and the benefit of integrating risk factors.
Materials and methods: Subjects with type 2 diabetes mellitus have been identified in the prospective UK Biobank Imaging study, and a matched control sample has been created to avoid confounding bias. Five-fold cross-validation is used for the evaluation. All scans from the two-point Dixon neck-to-knee sequence have been standardized. A neural network that considers multi-channel MRI input was developed and integrates clinical information in tabular format. An ensemble strategy is used to combine multi-station MRI predictions. A subset with quantitative fat measurements is identified for comparison to prior approaches.
Results: MRI scans from 3406 subjects (mean age, 66.2 years ± 7.1 [standard deviation]; 1128 women) were analyzed with 1703 diabetics. A balanced accuracy of 78.7%, AUC ROC of 0.872, and an average precision of 0.878 was obtained for the classification of diabetes. The ensemble over multiple Dixon MRI stations yields better performance than selecting the individually best station. Moreover, combining fat and water scans as multi-channel inputs to the networks improves upon just using single contrasts as input. Integrating clinical information about known risk factors of diabetes in the network boosts the performance across all stations and the ensemble. The neural network achieved superior results compared to the prediction based on quantitative MRI measurements.
Conclusions: The developed deep learning model accurately predicted type 2 diabetes from neck-to-knee two-point Dixon MRI scans.

MCML Authors

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

[690]

S. Feuerriegel, R. DiResta, J. A. Goldstein, S. Kumar, P. Lorenz-Spreen, M. Tomz and N. Pröllochs.
Research can help to tackle AI-generated disinformation.
Nature Human Behaviour 7 (Nov. 2023). DOI

Abstract

Generative artificial intelligence (AI) tools have made it easy to create realistic disinformation that is hard to detect by humans and may undermine public trust. Some approaches used for assessing the reliability of online information may no longer work in the AI age. We offer suggestions for how research can help to tackle the threats of AI-generated disinformation.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[689]

R. Raab, A. Küderle, A. Zakreuskaya, A. D. Stern, J. Klucken, G. Kaissis, D. Rückert, S. Boll, R. Eils, H. Wagener and B. M. Eskofier.
Federated electronic health records for the European Health Data Space.
The Lancet Digital Health 5.11 (Nov. 2023). DOI

Abstract

The European Commission’s draft for the European Health Data Space (EHDS) aims to empower citizens to access their personal health data and share it with physicians and other health-care providers. It further defines procedures for the secondary use of electronic health data for research and development. Although this planned legislation is undoubtedly a step in the right direction, implementation approaches could potentially result in centralised data silos that pose data privacy and security risks for individuals. To address this concern, we propose federated personal health data spaces, a novel architecture for storing, managing, and sharing personal electronic health records that puts citizens at the centre—both conceptually and technologically. The proposed architecture puts citizens in control by storing personal health data on a combination of personal devices rather than in centralised data silos. We describe how this federated architecture fits within the EHDS and can enable the same features as centralised systems while protecting the privacy of citizens. We further argue that increased privacy and control do not contradict the use of electronic health data for research and development. Instead, data sovereignty and transparency encourage active participation in studies and data sharing. This combination of privacy-by-design and transparent, privacy-preserving data sharing can enable health-care leaders to break the privacy-exploitation barrier, which currently limits the secondary use of health data in many cases.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[688]

H. Andrade-Loarca, J. Hege, D. Cremers and G. Kutyniok.
Neural Poisson Surface Reconstruction: Resolution-Agnostic Shape Reconstruction from Point Clouds.
Preprint (Nov. 2023). arXiv

Abstract

We introduce Neural Poisson Surface Reconstruction (nPSR), an architecture for shape reconstruction that addresses the challenge of recovering 3D shapes from points. Traditional deep neural networks face challenges with common 3D shape discretization techniques due to their computational complexity at higher resolutions. To overcome this, we leverage Fourier Neural Operators to solve the Poisson equation and reconstruct a mesh from oriented point cloud measurements. nPSR exhibits two main advantages: First, it enables efficient training on low-resolution data while achieving comparable performance at high-resolution evaluation, thanks to the resolution-agnostic nature of FNOs. This feature allows for one-shot super-resolution. Second, our method surpasses existing approaches in reconstruction quality while being differentiable and robust with respect to point sampling rates. Overall, the neural Poisson surface reconstruction not only improves upon the limitations of classical deep neural networks in shape reconstruction but also achieves superior results in terms of reconstruction quality, running time, and resolution agnosticism.

MCML Authors

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[687]

F. Filbir, M. Tasche and A. Veselovska.
Regularized Shannon sampling formulas related to the special affine Fourier transform.
Preprint (Nov. 2023). arXiv

Abstract

In this paper, we present new regularized Shannon sampling formulas related to the special affine Fourier transform (SAFT). These sampling formulas use localized sampling with special compactly supported window functions, namely B-spline, sinh-type, and continuous Kaiser-Bessel window functions. In contrast to the Shannon sampling series for SAFT, the regularized Shannon sampling formulas for SAFT possesses an exponential decay of the approximation error and are numerically robust in the presence of noise, if certain oversampling condition is fulfilled. Several numerical experiments illustrate the theoretical results.

MCML Authors

Hanna Veselovska

Dr.

Applied Numerical Analysis

[686]

A. Köksal, R. Aksitov and C.-C. Chang.
Hallucination Augmented Recitations for Language Models.
Preprint (Nov. 2023). arXiv

Abstract

Attribution is a key concept in large language models (LLMs) as it enables control over information sources and enhances the factuality of LLMs. While existing approaches utilize open book question answering to improve attribution, factual datasets may reward language models to recall facts that they already know from their pretraining data, not attribution. In contrast, counterfactual open book QA datasets would further improve attribution because the answer could only be grounded in the given text. We propose Hallucination Augmented Recitations (HAR) for creating counterfactual datasets by utilizing hallucination in LLMs to improve attribution. For open book QA as a case study, we demonstrate that models finetuned with our counterfactual datasets improve text grounding, leading to better open book QA performance, with up to an 8.0% increase in F1 score. Our counterfactual dataset leads to significantly better performance than using humanannotated factual datasets, even with 4x smaller datasets and 4x smaller models. We observe that improvements are consistent across various model sizes and datasets, including multi-hop, biomedical, and adversarial QA datasets.

MCML Authors

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[685]

W. Lai, V. Hangya and A. Fraser.
Extending Multilingual Machine Translation through Imitation Learning.
Preprint (Nov. 2023). arXiv

Abstract

Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world’s languages are still being left behind. We aim to extend large-scale MNMT models to a new language, allowing for translation between the newly added and all of the already supported languages in a challenging scenario: using only a parallel corpus between the new language and English. Previous approaches, such as continued training on parallel data including the new language, suffer from catastrophic forgetting (i.e., performance on other languages is reduced). Our novel approach Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert, a technique widely used in the computer vision area, but not well explored in NLP. More specifically, we construct a pseudo multi-parallel corpus of the new and the original languages by pivoting through English, and imitate the output distribution of the original MNMT model. Extensive experiments show that our approach significantly improves the translation performance between the new and the original languages, without severe catastrophic forgetting. We also demonstrate that our approach is capable of solving copy and off-target problems, which are two common issues existence in current large-scale MNMT models.

MCML Authors

Wen Lai

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

Viktor Hangya

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

* Former Member

Alexander Fraser

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Data Analytics & Statistics

[684]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Post-hoc Orthogonalization for Mitigation of Protected Feature Bias in CXR Embeddings.
Preprint (Nov. 2023). arXiv

Abstract

Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trained models, namely a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our statistical analysis involves comparing the original versus the orthogonalized embeddings by estimating protected feature influences and evaluating the ability to predict race, age, or sex using the two types of embeddings. Results: Our experiments reveal a significant influence of protected features on predictions of pathologies. Applying orthogonalization removes these feature effects. Apart from removing any influence on pathology classification, while maintaining competitive predictive performance, orthogonalized embeddings further make it infeasible to directly predict protected attributes and mitigate subgroup disparities. Conclusion: The presented work demonstrates the successful application and evaluation of the orthogonalization technique in the domain of chest X-ray image classification.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[683]

A. Maldonado, L. Zellner, S. Strickroth and T. Seidl.
Process Mining Techniques for Collusion Detection in Online Exams.
EduPM @ICPM 2023 - 2nd International Workshop ‘Education meets Process Mining’ organized with the 5th International Conference on Process Mining (ICPM 2023). Rome, Italy, Oct 23-27, 2023. DOI

Abstract

Honesty and fairness are essential. As many skills, practicing those values starts in the classroom. Whether students are examined online or on-site, only testing their knowledge righteously, educators can assess their skills and room for improvement. As online exams increase, we are provided with more suitable data for analysis. Process mining methods as anomaly detection and trace clustering techniques have been used to identify dishonest behavior in other fields, as e.g. fraud detection. In this paper, we investigate collusion detection in online exams as a process mining task. We explore trace ordering for anomaly detection (TOAD) as well as hierarchical agglomerative trace clustering (HATC). Promising preliminary results exemplify, how process mining techniques empower teachers in their decision making, while via flexible configuration of parameters, leaves the last word to them.

MCML Authors

Andrea Maldonado

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[682]

A. Maldonado, G. M. Tavares, R. Oyamada, P. Ceravolo and T. Seidl.
FEEED: Feature Extraction from Event Data.
ICPM 2023 - Doctoral Consortium at the 5th International Conference on Process Mining. Rome, Italy, Oct 23-27, 2023. PDF

Abstract

The analysis of event data is largely influenced by the effective characterization of descriptors. These descriptors serve as the building blocks of our understanding, encapsulating the behavior described within the event data. In light of these considerations, we introduce FEEED (Feature Extraction from Event Data), an extendable tool for event data feature extraction. FEEED represents a significant advancement in event data behavior analysis, offering a range of features to empower analysts and data scientists in their pursuit of insightful, actionable, and understandable event data analysis. What sets FEEED apart is its unique capacity to act as a bridge between the worlds of data mining and process mining. In doing so, it promises to enhance the accuracy, comprehensiveness, and utility of characterizing event data for a diverse range of applications.

MCML Authors

Andrea Maldonado

Database Systems and Data Mining

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[681]

C. Leiber, L. Miklautz, C. Plant and C. Böhm.
Application of Deep Clustering Algorithms.
CIKM 2023 - 32nd ACM International Conference on Information and Knowledge Management. Birmingham, UK, Oct 21-25, 2023. DOI

Abstract

Deep clustering algorithms have gained popularity for clustering complex, large-scale data sets, but getting started is difficult because of numerous decisions regarding architecture, optimizer, and other hyperparameters. Theoretical foundations must be known to obtain meaningful results. At the same time, ease of use is necessary to get used by a broader audience. Therefore, we require a unified framework that allows for easy execution in diverse settings. While this applies to established clustering methods like k-Means and DBSCAN, deep clustering algorithms lack a standard structure, resulting in significant programming overhead. This complicates empirical evaluations, which are essential in both scientific and practical applications. We present a solution to this problem by providing a theoretical background on deep clustering as well as practical implementation techniques and a unified structure with predefined neural networks. For the latter, we use the Python package ClustPy. The aim is to share best practices and facilitate community participation in deep clustering research.

MCML Authors

Collin Leiber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Christian Böhm

Prof. Dr.

* Former Member

[680]

G. König.
If interpretability is the answer, what is the question?: a causal perspective.
Dissertation 2023. DOI

Abstract

This thesis addresses fundamental challenges in the field of interpretable machine learning (IML), particularly the lack of a clear definition of ‘interpretability’, the potential misinterpretation of existing methods, and the computational difficulties of conditional-sampling-based techniques. By disentangling the different goals of interpretability, we provide clearer guidelines for deriving target estimands, with specific examples such as recourse and scientific inference. Additionally, we propose formal interpretation rules for feature importance, highlight common pitfalls in IML, and introduce efficient methods for estimating conditional-sampling techniques by leveraging the data’s dependence structure, with a strong emphasis on causal inference to improve clarity and computational efficiency. (Shortened.)

MCML Authors

Gunnar König

Dr.

* Former Member

[679]

Y. Xin, X. Zuo, D. Lu and S. Leutenegger.
SimpleMapping: Real-time visual-inertial dense mapping with deep multi-view stereo.
ISMAR 2023 - IEEE/ACM International Symposium on Mixed and Augmented Reality. Sydney, Australia, Oct 16-20, 2023. DOI

Abstract

We present a real-time visual-inertial dense mapping method capable of performing incremental 3D mesh reconstruction with high quality using only sequential monocular images and inertial measurement unit (IMU) readings. 6-DoF camera poses are estimated by a robust feature-based visual-inertial odometry (VIO), which also generates noisy sparse 3D map points as a by-product. We propose a sparse point aided multi-view stereo neural network (SPA-MVSNet) that can effectively leverage the informative but noisy sparse points from the VIO system. The sparse depth from VIO is firstly completed by a single-view depth completion network. This dense depth map, although naturally limited in accuracy, is then used as a prior to guide our MVS network in the cost volume generation and regularization for accurate dense depth prediction. Predicted depth maps of keyframe images by the MVS network are incrementally fused into a global map using TSDF-Fusion. We extensively evaluate both the proposed SPA-MVSNet and the entire dense mapping system on several public datasets as well as our own dataset, demonstrating the system’s impressive generalization capabilities and its ability to deliver high-quality 3D reconstruction online. Our proposed dense mapping system achieves a 39.7% improvement in F-score over existing systems when evaluated on the challenging scenarios of the EuRoC dataset.

MCML Authors

Xingxing Zuo

Dr.

B3 | Multimodal Perception
→ Group Stefan Leutenegger

* Former Member

Stefan Leutenegger

Prof. Dr.

Machine Learning for Robotics

[678]

J. Hanselle, J. Fürnkranz and E. Hüllermeier.
Probabilistic Scoring Lists for Interpretable Machine Learning.
DS 2023 - 26th International Conference on Discovery Science. Porto, Portugal, Oct 09-11, 2023. DOI

Abstract

A scoring system is a simple decision model that checks a set of features, adds a certain number of points to a total score for each feature that is satisfied, and finally makes a decision by comparing the total score to a threshold. Scoring systems have a long history of active use in safety-critical domains such as healthcare and justice, where they provide guidance for making objective and accurate decisions. Given their genuine interpretability, the idea of learning scoring systems from data is obviously appealing from the perspective of explainable AI. In this paper, we propose a practically motivated extension of scoring systems called probabilistic scoring lists (PSL), as well as a method for learning PSLs from data. Instead of making a deterministic decision, a PSL represents uncertainty in the form of probability distributions. Moreover, in the spirit of decision lists, a PSL evaluates features one by one and stops as soon as a decision can be made with enough confidence. To evaluate our approach, we conduct a case study in the medical domain.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[677]

L. Miklautz, A. Shkabrii, C. Leiber, B. Tobias, B. Seidl, E. Weissensteiner, A. Rausch, C. Böhm and C. Plant.
Non-Redundant Image Clustering of Early Medieval Glass Beads.
DSAA 2023 - 10th IEEE International Conference on Data Science and Advanced Analytics. Thessaloniki, Greece, Oct 09-13, 2023. DOI

Abstract

Glass beads were among the most common grave goods in the Early Middle Ages, with an estimated number in the millions. The color, size, shape and decoration of the beads are diverse leading to many different archaeological classification systems that depend on the subjective decisions of individual experts. The lack of an agreed upon expert categorization leads to a pressing problem in archaeology, as the categorization of archaeological artifacts, like glass beads, is important to learn about cultural trends, manufacturing processes or economic relationships (e.g., trade routes) of historical times. An automated, objective and reproducible classification system is therefore highly desirable. We present a high-quality data set of images of Early Medieval beads and propose a clustering pipeline to learn a classification system in a data-driven way. The pipeline consists of a novel extension of deep embedded non-redundant clustering to identify multiple, meaningful clusterings of glass bead images. During the cluster analysis we address several challenges associated with the data and as a result identify high-quality clusterings that overlap with archaeological domain expertise. To the best of our knowledge this is the first application of non-redundant image clustering for archaeological data.

MCML Authors

Collin Leiber

* Former Member

Christian Böhm

Prof. Dr.

* Former Member

[676]

J. Brandt, E. Schede, S. Sharma, V. Bengs, E. Hüllermeier and K. Tierney.
Contextual Preselection Methods in Pool-based Realtime Algorithm Configuration.
LWDA 2023 - Conference on Lernen. Wissen. Daten. Analysen. Marburg, Germany, Oct 09-11, 2023. PDF

Abstract

Realtime algorithm configuration is concerned with the task of designing a dynamic algorithm configurator that observes sequentially arriving problem instances of an algorithmic problem class for which it selects suitable algorithm configurations (e.g., minimal runtime) of a specific target algorithm. The Contextual Preselection under the Plackett-Luce (CPPL) algorithm maintains a pool of configurations from which a set of algorithm configurations is selected that are run in parallel on the current problem instance. It uses the well-known UCB selection strategy from the bandit literature, while the pool of configurations is updated over time via a racing mechanism. In this paper, we investigate whether the performance of CPPL can be further improved by using different bandit-based selection strategies as well as a ranking-based strategy to update the candidate pool. Our experimental results show that replacing these components can indeed improve performance again significantly.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[675]

J. Hanselle, J. Kornowicz, S. Heid, K. Thommes and E. Hüllermeier.
Comparing Humans and Algorithms in Feature Ranking: A Case-Study in the Medical Domain.
LWDA 2023 - Conference on Lernen. Wissen. Daten. Analysen. Marburg, Germany, Oct 09-11, 2023. PDF

Abstract

The selection of useful, informative, and meaningful features is a key prerequisite for the successful application of machine learning in practice, especially in knowledge-intense domains like decision support. Here, the task of feature selection, or ranking features by importance, can, in principle, be solved automatically in a data-driven way but also supported by expert knowledge. Besides, one may of course, conceive a combined approach, in which a learning algorithm closely interacts with a human expert. In any case, finding an optimal approach requires a basic understanding of human capabilities in judging the importance of features compared to those of a learning algorithm. Hereto, we conducted a case study in the medical domain, comparing feature rankings based on human judgment to rankings automatically derived from data. The quality of a ranking is determined by the performance of a decision list processing features in the order specified by the ranking, more specifically by so-called probabilistic scoring systems.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[674]

D. Scholz, B. Wiestler, D. Rückert and M. Menten.
Metrics to Quantify Global Consistency in Synthetic Medical Images.
DGM4 @MICCAI 2023 - 3rd International Workshop on Deep Generative Models at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Image synthesis is increasingly being adopted in medical image processing, for example for data augmentation or inter-modality image translation. In these critical applications, the generated images must fulfill a high standard of biological correctness. A particular requirement for these images is global consistency, i.e an image being overall coherent and structured so that all parts of the image fit together in a realistic and meaningful way. Yet, established image quality metrics do not explicitly quantify this property of synthetic images. In this work, we introduce two metrics that can measure the global consistency of synthetic images on a per-image basis. To measure the global consistency, we presume that a realistic image exhibits consistent properties, e.g., a person’s body fat in a whole-body MRI, throughout the depicted object or scene. Hence, we quantify global consistency by predicting and comparing explicit attributes of images on patches using supervised trained neural networks. Next, we adapt this strategy to an unlabeled setting by measuring the similarity of implicit image features predicted by a self-supervised trained network. Our results demonstrate that predicting explicit attributes of synthetic images on patches can distinguish globally consistent from inconsistent images. Implicit representations of images are less sensitive to assess global consistency but are still serviceable when labeled data is unavailable. Compared to established metrics, such as the FID, our method can explicitly measure global consistency on a per-image basis, enabling a dedicated analysis of the biological plausibility of single synthetic images.

MCML Authors

Daniel Scholz

AI for Image-Guided Diagnosis and Therapy

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

[673]

V. A. Zimmer, K. Hammernik, V. Sideri-Lampretsa, W. Huang, A. Reithmeir, D. Rückert and J. A. Schnabel.
Towards Generalised Neural Implicit Representations for Image Registration.
DGM4 @MICCAI 2023 - 3rd International Workshop on Deep Generative Models at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Neural implicit representations (NIRs) enable to generate and parametrize the transformation for image registration in a continuous way. By design, these representations are image-pair-specific, meaning that for each signal a new multi-layer perceptron has to be trained. In this work, we investigate for the first time the potential of existent NIR generalisation methods for image registration and propose novel methods for the registration of a group of image pairs using NIRs. To exploit the generalisation potential of NIRs, we encode the fixed and moving image volumes to latent representations, which are then used to condition or modulate the NIR. Using ablation studies on a 3D benchmark dataset, we show that our methods are able to generalise to a set of image pairs with a performance comparable to pairwise registration using NIRs when trained on and datasets. Our results demonstrate the potential of generalised NIRs for 3D deformable image registration.

MCML Authors

Anna Reithmeir

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[672]

Y. Yeganeh, A. Farshad, G. Guevercin, A. Abu-zer, R. Xiao, Y. Tang, E. Adeli and N. Navab.
SCOPE: Structural Continuity Preservation for Medical Image Segmentation.
GRAIL @MICCAI 2023 - 5th Workshop on GRaphs in biomedicAl Image anaLysis at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Although the preservation of shape continuity and physiological anatomy is a natural assumption in the segmentation of medical images, it is often neglected by deep learning methods that mostly aim for the statistical modeling of input data as pixels rather than interconnected structures. In biological structures, however, organs are not separate entities; for example, in reality, a severed vessel is an indication of an underlying problem, but traditional segmentation models are not designed to strictly enforce the continuity of anatomy, potentially leading to inaccurate medical diagnoses. To address this issue, we propose a graph-based approach that enforces the continuity and connectivity of anatomical topology in medical images. Our method encodes the continuity of shapes as a graph constraint, ensuring that the network’s predictions maintain this continuity. We evaluate our method on two public benchmarks on retinal vessel segmentation, showing significant improvements in connectivity metrics compared to traditional methods while getting better or on-par performance on segmentation metrics.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Rui Xiao

Interpretable and Reliable Machine Learning

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[671]

Y. Yeganeh, G. Güvercin, R. Xiao, A. Abuzer, E. Adeli, A. Farshad and N. Navab.
SCOPE: Structural Continuity Preservation for Retinal Vessel Segmentation.
GRAIL @MICCAI 2023 - 5th Workshop on GRaphs in biomedicAl Image anaLysis at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Although the preservation of shape continuity and physiological anatomy is a natural assumption in the segmentation of medical images, it is often neglected by deep learning methods that mostly aim for the statistical modeling of input data as pixels rather than interconnected structures. In biological structures, however, organs are not separate entities; for example, in reality, a severed vessel is an indication of an underlying problem, but traditional segmentation models are not designed to strictly enforce the continuity of anatomy, potentially leading to inaccurate medical diagnoses. To address this issue, we propose a graph-based approach that enforces the continuity and connectivity of anatomical topology in medical images. Our method encodes the continuity of shapes as a graph constraint, ensuring that the network’s predictions maintain this continuity. We evaluate our method on three public benchmarks of retinal vessel segmentation and one neuronal structure segmentation benchmark, showing significant improvements in connectivity metrics compared to previous works while getting better or on-par performance on segmentation metrics.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Rui Xiao

Interpretable and Reliable Machine Learning

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[670]

R. Holland, O. Leingang, C. Holmes, P. Anders, R. Kaye, S. Riedl, J. C. Paetzold, I. Ezhov, H. Bogunović, U. Schmidt-Erfurth, H. P. N. Scholl, S. Sivaprasad, A. J. Lotery, D. Rückert and M. Menten.
Clustering Disease Trajectories in Contrastive Feature Space for Biomarker Proposal in Age-Related Macular Degeneration.
MICCAI 2023 - 26th International Conference on Medical Image Computing and Computer Assisted Intervention. Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Age-related macular degeneration (AMD) is the leading cause of blindness in the elderly. Current grading systems based on imaging biomarkers only coarsely group disease stages into broad categories that lack prognostic value for future disease progression. It is widely believed that this is due to their focus on a single point in time, disregarding the dynamic nature of the disease. In this work, we present the first method to automatically propose biomarkers that capture temporal dynamics of disease progression. Our method represents patient time series as trajectories in a latent feature space built with contrastive learning. Then, individual trajectories are partitioned into atomic sub-sequences that encode transitions between disease states. These are clustered using a newly introduced distance metric. In quantitative experiments we found our method yields temporal biomarkers that are predictive of conversion to late AMD. Furthermore, these clusters were highly interpretable to ophthalmologists who confirmed that many of the clusters represent dynamics that have previously been linked to the progression of AMD, even though they are currently not included in any clinical grading system.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

[669]

N. Stolt-Ansó, J. McGinnis, J. Pan, K. Hammernik and D. Rückert.
NISF: Neural implicit segmentation functions.
MICCAI 2023 - 26th International Conference on Medical Image Computing and Computer Assisted Intervention. Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Segmentation of anatomical shapes from medical images has taken an important role in the automation of clinical measurements. While typical deep-learning segmentation approaches are performed on discrete voxels, the underlying objects being analysed exist in a real-valued continuous space. Approaches that rely on convolutional neural networks (CNNs) are limited to grid-like inputs and not easily applicable to sparse or partial measurements. We propose a novel family of image segmentation models that tackle many of CNNs’ shortcomings: Neural Implicit Segmentation Functions (NISF). Our framework takes inspiration from the field of neural implicit functions where a network learns a mapping from a real-valued coordinate-space to a shape representation. NISFs have the ability to segment anatomical shapes in high-dimensional continuous spaces. Training is not limited to voxelized grids, and covers applications with sparse and partial data. Interpolation between observations is learnt naturally in the training procedure and requires no post-processing. Furthermore, NISFs allow the leveraging of learnt shape priors to make predictions for regions outside of the original image plane. We go on to show the framework achieves dice scores of on a (3D+t) short-axis cardiac segmentation task using the UK Biobank dataset. We also provide a qualitative analysis on our frameworks ability to perform segmentation and image interpolation on unseen regions of an image volume at arbitrary resolutions.

MCML Authors

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[668]

Y. Yeganeh, A. Farshad and N. Navab.
Anatomy-Aware Masking for Inpainting in Medical Imaging.
ShapeMI @MICCAI 2023 - 3rd Workshop on Shape in Medical Imaging at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI GitHub

Abstract

Inpainting has recently been employed as a successful deep-learning technique for unsupervised model discovery in medical image analysis by taking advantage of the strong priors learned by models to reconstruct the structure and texture of missing parts in images. Even though the learned features depend on the masks as well as the images, the masks used for inpainting are typically random and independent of the dataset, due to the unpredictability of the content of images, i.e., different objects and shapes can appear in different locations in images. However, this is rarely the case for medical imaging data since they are obtained from similar anatomies. Still, random square masks are the most popular technique for inpainting in medical imaging. In this work, we propose a pipeline to generate, position and sample the masks to efficiently learn the shape and structures of the anatomy and generate a myriad of diverse anatomy-aware masks, aiding the model in learning the statistical shape prior to the topology of the organs of interest. We demonstrate the impact of our approach compared to other masking mechanisms in the reconstruction of anatomy. We compare the effectiveness of our proposed masking approach over square-shaped masks, which are traditionally used in medical imaging, and irregular shape masks, which are used in SOTA inpainting literature.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[667]

M. Zaiss, H. N. Dang, V. Golkov, J. R. Rajput, D. Cremers, F. Knoll and A. Maier.
GPT4MR: Exploring GPT-4 as an MR Sequence and Reconstruction Programming Assistant.
ESMRMB 2023 - 39th Annual Meeting of the European Society for Magnetic Resonance in Medicine and Biology. Basel, Switzerland, Oct 04-07, 2023. URL

Abstract

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Computer Vision & Artificial Intelligence

[666]

M. Bernhard, N. Strauß and M. Schubert.
MapFormer: Boosting Change Detection by Using Pre-change Information.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI GitHub

Abstract

Change detection in remote sensing imagery is essential for a variety of applications such as urban planning, disaster management, and climate research. However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth’s surface. In this paper, we leverage this information for change detection in bi-temporal images. We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods. Motivated by this observation, we propose the new task of Conditional Change Detection, where pre-change semantic information is used as input next to bi-temporal images. To fully exploit the extra information, we propose MapFormer, a novel architecture based on a multi-modal feature fusion module that allows for feature processing conditioned on the available semantic information. We further employ a supervised, cross-modal contrastive loss to guide the learning of visual representations. Our approach outperforms existing change detection methods by an absolute 11.7% and 18.4% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively. Furthermore, we demonstrate the robustness of our approach to the quality of the pre-change semantic information and the absence pre-change imagery.

MCML Authors

Maximilian Bernhard

* Former Member

Niklas Strauß

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[665]

H. Chen, A. Frikha, D. Krompass, J. Gu and V. Tresp.
FRAug: Tackling Federated Learning with Non-IID Features via Representation Augmentation.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

Federated Learning (FL) is a decentralized machine learning paradigm, in which multiple clients collaboratively train neural networks without centralizing their local data, and hence preserve data privacy. However, real-world FL applications usually encounter challenges arising from distribution shifts across the local datasets of individual clients. These shifts may drift the global model aggregation or result in convergence to deflected local optimum. While existing efforts have addressed distribution shifts in the label space, an equally important challenge remains relatively unexplored. This challenge involves situations where the local data of different clients indicate identical label distributions but exhibit divergent feature distributions. This issue can significantly impact the global model performance in the FL framework. In this work, we propose Federated Representation Augmentation (FRAug) to resolve this practical and challenging problem. FRAug optimizes a shared embedding generator to capture client consensus. Its output synthetic embeddings are transformed into client-specific by a locally optimized RTNet to augment the training space of each client. Our empirical evaluation on three public benchmarks and a real-world medical dataset demonstrates the effectiveness of the proposed method, which substantially outperforms the current state-of-the-art FL methods for feature distribution shifts, including PartialFed and FedBN.

MCML Authors

Haokun Chen

Database Systems and Data Mining

Ahmed Frikha

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[664]

M. B. Colomer, P. L. Dovesi, T. Panagiotakopoulos, J. F. Carvalho, L. Härenstam-Nielsen, H. Azizpour, H. Kjellström, D. Cremers and M. Poggi.
To adapt or not to adapt? Real-time adaptation for semantic segmentation.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

The goal of Online Domain Adaptation for semantic segmentation is to handle unforeseeable domain changes that occur during deployment, like sudden weather events. However, the high computational costs associated with brute-force adaptation make this paradigm unfeasible for real-world applications. In this paper we propose HAMLET, a Hardware-Aware Modular Least Expensive Training framework for real-time domain adaptation. Our approach includes a hardware-aware back-propagation orchestration agent (HAMT) and a dedicated domain-shift detector that enables active control over when and how the model is adapted (LT). Thanks to these advancements, our approach is capable of performing semantic segmentation while simultaneously adapting at more than 29FPS on a single consumer-grade GPU. Our framework’s encouraging accuracy and speed trade-off is demonstrated on OnDA and SHIFT benchmarks through experimental results.

MCML Authors

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[663]

M. Gao, P. Roetzer, M. Eisenberger, Z. Lähner, M. Moeller, D. Cremers and F. Bernard.
ΣIGMA: Scale-Invariant Global Sparse Shape Matching.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

We propose a novel mixed-integer programming (MIP) formulation for generating precise sparse correspondences for highly non-rigid shapes. To this end, we introduce a projected Laplace-Beltrami operator (PLBO) which combines intrinsic and extrinsic geometric information to measure the deformation quality induced by predicted correspondences. We integrate the PLBO, together with an orientation-aware regulariser, into a novel MIP formulation that can be solved to global optimality for many practical problems. In contrast to previous methods, our approach is provably invariant to rigid transformations and global scaling, initialisation-free, has optimality guarantees, and scales to high resolution meshes with (empirically observed) linear time. We show state-of-the-art results for sparse non-rigid matching on several challenging 3D datasets, including data with inconsistent meshing, as well as applications in mesh-to-point-cloud matching.

MCML Authors

Maolin Gao

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[662]

H. Li, J. Gu, R. Koner, S. Sharifzadeh and V. Tresp.
Do DALL-E and Flamingo Understand Each Other?
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI GitHub

Abstract

The field of multimodal research focusing on the comprehension and creation of both images and text has witnessed significant strides. This progress is exemplified by the emergence of sophisticated models dedicated to image captioning at scale, such as the notable Flamingo model and text-to-image generative models, with DALL-E serving as a prominent example. An interesting question worth exploring in this domain is whether Flamingo and DALL-E understand each other. To study this question, we propose a reconstruction task where Flamingo generates a description for a given image and DALL-E uses this description as input to synthesize a new image. We argue that these models understand each other if the generated image is similar to the given image. Specifically, we study the relationship between the quality of the image reconstruction and that of the text generation. We find that an optimal description of an image is one that gives rise to a generated image similar to the original one. The finding motivates us to propose a unified framework to finetune the text-to-image and image-to-text models. Concretely, the reconstruction part forms a regularization loss to guide the tuning of the models. Extensive experiments on multiple datasets with different image captioning and image generation models validate our findings and demonstrate the effectiveness of our proposed unified framework. As DALL-E and Flamingo are not publicly available, we use Stable Diffusion and BLIP in the remaining work.

MCML Authors

Hang Li

* Former Member

Rajat Koner

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[661]

H. Li, J. Dong, B. Wen, M. Gao, T. Huang, Y.-H. Liu and D. Cremers.
DDIT: Semantic Scene Completion via Deformable Deep Implicit Templates.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

Scene reconstructions are often incomplete due to occlusions and limited viewpoints. There have been efforts to use semantic information for scene completion. However, the completed shapes may be rough and imprecise since respective methods rely on 3D convolution and/or lack effective shape constraints. To overcome these limitations, we propose a semantic scene completion method based on deformable deep implicit templates (DDIT). Specifically, we complete each segmented instance in a scene by deforming a template with a latent code. Such a template is expressed by a deep implicit function in the canonical frame. It abstracts the shape prior of a category, and thus can provide constraints on the overall shape of an instance. Latent code controls the deformation of template to guarantee fine details of an instance. For code prediction, we design a neural network that leverages both intra-and inter-instance information. We also introduce an algorithm to transform instances between the world and canonical frames based on geometric constraints and a hierarchical tree. To further improve accuracy, we jointly optimize the latent code and transformation by enforcing the zero-valued isosurface constraint. In addition, we establish a new dataset to solve different problems of existing datasets. Experiments showed that our DDIT outperforms state-of-the-art approaches.

MCML Authors

Haoang Li

Dr.

* Former Member

Maolin Gao

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[660]

M. Menten, J. C. Paetzold, V. A. Zimmer, S. Shit, I. Ezhov, R. Holland, M. Probst, J. A. Schnabel and D. Rückert.
A Skeletonization Algorithm for Gradient-Based Optimization.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

The skeleton of a digital image is a compact representation of its topology, geometry, and scale. It has utility in many computer vision applications, such as image description, segmentation, and registration. However, skeletonization has only seen limited use in contemporary deep learning solutions. Most existing skeletonization algorithms are not differentiable, making it impossible to integrate them with gradient-based optimization. Compatible algorithms based on morphological operations and neural networks have been proposed, but their results often deviate from the geometry and topology of the true medial axis. This work introduces the first three-dimensional skeletonization algorithm that is both compatible with gradient-based optimization and preserves an object’s topology. Our method is exclusively based on matrix additions and multiplications, convolutional operations, basic non-linear functions, and sampling from a uniform probability distribution, allowing it to be easily implemented in any major deep learning library. In benchmarking experiments, we prove the advantages of our skeletonization algorithm compared to non-differentiable, morphological, and neural-network-based baselines. Finally, we demonstrate the utility of our algorithm by integrating it with two medical image processing applications that use gradient-based optimization: deep-learning-based blood vessel segmentation, and multimodal registration of the mandible in computed tomography and magnetic resonance images.

MCML Authors

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[659]

Y. Xia, M. Gladkova, R. Wang, Q. Li, U. Stilla, J. F. Henriques and D. Cremers.
CASSPR: Cross Attention Single Scan Place Recognition.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

Place recognition based on point clouds (LiDAR) is an important component for autonomous robots or self-driving vehicles. Current SOTA performance is achieved on accumulated LiDAR submaps using either point-based or voxel-based structures. While voxel-based approaches nicely integrate spatial context across multiple scales, they do not exhibit the local precision of point-based methods. As a result, existing methods struggle with fine-grained matching of subtle geometric features in sparse single-shot Li-DAR scans. To overcome these limitations, we propose CASSPR as a method to fuse point-based and voxel-based approaches using cross attention transformers. CASSPR leverages a sparse voxel branch for extracting and aggregating information at lower resolution and a point-wise branch for obtaining fine-grained local information. CASSPR uses queries from one branch to try to match structures in the other branch, ensuring that both extract self-contained descriptors of the point cloud (rather than one branch dominating), but using both to inform the out-put global descriptor of the point cloud. Extensive experiments show that CASSPR surpasses the state-of-the-art by a large margin on several datasets (Oxford RobotCar, TUM, USyd). For instance, it achieves AR@1 of 85.6% on the TUM dataset, surpassing the strongest prior model by ~15%. Our code is publicly available.

MCML Authors

Yan Xia

Dr.

* Former Member

Mariia Gladkova

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[658]

G. Zhang, J. Ren, J. Gu and V. Tresp.
Multi-event Video-Text Retrieval.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI GitHub

Abstract

Video-Text Retrieval (VTR) is a crucial multi-modal task in an era of massive video-text data on the Internet. A plethora of work characterized by using a two-stream Vision-Language model architecture that learns a joint representation of video-text pairs has become a prominent approach for the VTR task. However, these models operate under the assumption of bijective video-text correspondences and neglect a more practical scenario where video content usually encompasses multiple events, while texts like user queries or webpage metadata tend to be specific and correspond to single events. This establishes a gap between the previous training objective and real-world applications, leading to the potential performance degradation of earlier models during inference. In this study, we introduce the Multi-event Video-Text Retrieval (MeVTR) task, addressing scenarios in which each video contains multiple different events, as a niche scenario of the conventional Video-Text Retrieval Task. We present a simple model, Me-Retriever, which incorporates key event video representation and a new MeVTR loss for the MeVTR task. Comprehensive experiments show that this straightforward framework outperforms other models in the Video-to-Text and Text-to-Video tasks, effectively establishing a robust baseline for the MeVTR task. We believe this work serves as a strong foundation for future studies.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[657]

A. Farshad, Y. Yeganeh, Y. Chi, C. Shen, B. Ommer and N. Navab.
Scenegenie: Scene graph guided diffusion models for image synthesis.
ICCV 2023 - Workshop at the IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

Text-conditioned image generation has made significant progress in recent years with generative adversarial networks and more recently, diffusion models. While diffusion models conditioned on text prompts have produced impressive and high-quality images, accurately representing complex text prompts such as the number of instances of a specific object remains challenging.To address this limitation, we propose a novel guidance approach for the sampling process in the diffusion model that leverages bounding box and segmentation map information at inference time without additional training data. Through a novel loss in the sampling process, our approach guides the model with semantic features from CLIP embeddings and enforces geometric constraints, leading to high-resolution images that accurately represent the scene. To obtain bounding box and segmentation map information, we structure the text prompt as a scene graph and enrich the nodes with CLIP embeddings. Our proposed model achieves state-of-the-art performance on two public benchmarks for image generation from scene graphs, surpassing both scene graph to image and text-based diffusion models in various metrics. Our results demonstrate the effectiveness of incorporating bounding box and segmentation map guidance in the diffusion model sampling process for more accurate text-to-image generation.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Björn Ommer

Prof. Dr.

Computer Vision & Learning

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[656]

Y. Yeganeh, A. Farshad, P. Weinberger, S.-A. Ahmadi, E. Adeli and N. Navab.
Transformers pay attention to convolutions leveraging emerging properties of vits by dual attention-image network.
ICCV 2023 - Workshop at the IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

Although purely transformer-based architectures pretrained on large datasets are introduced as foundation models for general computer vision tasks, hybrid models that incorporate combinations of convolution and transformer blocks showed state-of-the-art performance in more specialized tasks. Nevertheless, despite the performance gain of both pure and hybrid transformer-based architectures compared to convolutional networks, their high training cost and complexity make it challenging to use them in real scenarios. In this work, we propose a novel and simple architecture based on only convolutional layers and show that by just taking advantage of the attention map visualizations obtained from a self-supervised pretrained vision transformer network, complex transformer-based networks, and even 3D architectures are outperformed with much fewer computation costs. The proposed architecture is composed of two encoder branches with the original image as input in one branch and the attention map visualizations of the same image from multiple self-attention heads from a pre-trained DINO model in the other branch. The results of our experiments on medical imaging datasets show that the extracted attention map visualizations from the attention heads of a pre-trained transformer architecture combined with the image provide strong prior knowledge for a pure CNN architecture to outperform CNN-based and transformer-based architectures.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[655]

M. Rezaei, F. Soleymani, B. Bischl and S. Azizi.
Deep Bregman divergence for self-supervised representations learning.
Computer Vision and Image Understanding 235.103801 (Oct. 2023). DOI

Abstract

Neural Bregman divergence measures the divergence of data points using convex neural networks, which is beyond Euclidean distance and capable of capturing divergence over distributions. The non-Euclidean geometry is not well explored in deep representation learning and remains a challenging endeavor for self-supervised representation learning. In this paper, we propose deep Bregman divergences for self-supervised pretext task learning, where we aim to enhance self-supervised embedding representation by training additional networks based on functional Bregman divergences. Our framework can capture the divergence of embedding distributions and improve the quality of learned representation using an arbitrary Bregman divergence over data embedding. Specifically, we develop a novel self-supervised architecture and a new divergence loss that measures the asymmetric distance of arbitrary Bergman divergences of neural networks. We show that the combination of self-supervised contrastive learning and our proposed method outperforms the baseline as well as most established methods for self-supervised and semi-supervised learning on multiple classifications and object detection tasks and datasets. Moreover, the learned representations generalize well when transferred to other datasets and tasks.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Statistical Learning and Data Science

[654]

K. Riedl.
Leveraging Memory Effects and Gradient Information in Consensus-Based Optimisation: On Global Convergence in Mean-Field Law.
European Journal of Applied Mathematics (Oct. 2023). DOI

Abstract

In this paper, we study consensus-based optimisation (CBO), a versatile, flexible and customisable optimisation method suitable for performing nonconvex and nonsmooth global optimisations in high dimensions. CBO is a multi-particle metaheuristic, which is effective in various applications and at the same time amenable to theoretical analysis thanks to its minimalistic design. The underlying dynamics, however, is flexible enough to incorporate different mechanisms widely used in evolutionary computation and machine learning, as we show by analysing a variant of CBO which makes use of memory effects and gradient information. We rigorously prove that this dynamics converges to a global minimiser of the objective function in mean-field law for a vast class of functions under minimal assumptions on the initialisation of the method. The proof in particular reveals how to leverage further, in some applications advantageous, forces in the dynamics without loosing provable global convergence. To demonstrate the benefit of the herein investigated memory effects and gradient information in certain applications, we present numerical evidence for the superiority of this CBO variant in applications such as machine learning and compressed sensing, which en passant widen the scope of applications of CBO.

MCML Authors

Konstantin Riedl

Dr.

* Former Member

[653]

L. Weissweiler, V. Hofmann, A. Köksal and H. Schütze.
Explaining pretrained language models' understanding of linguistic structures using construction grammar.
Frontiers in Artificial Intelligence 6 (Oct. 2023). DOI

Abstract

Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasizing the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step toward assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models’ behavior in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs, as well as OPT, are able to recognize the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.

MCML Authors

Leonie Weissweiler

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Valentin Hofmann

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[652]

J. Pan, C. Zhou, M. Gladkova, Q. Khan and D. Cremers.
Robust Autonomous Vehicle Pursuit without Expert Steering Labels.
IEEE Robotics and Automation Letters 8.10 (Oct. 2023). DOI

Abstract

In this work, we present a learning method for both lateral and longitudinal motion control of an ego-vehicle for the task of vehicle pursuit. The car being controlled does not have a pre-defined route, rather it reactively adapts to follow a target vehicle while maintaining a safety distance. To train our model, we do not rely on steering labels recorded from an expert driver, but effectively leverage a classical controller as an offline label generation tool. In addition, we account for the errors in the predicted control values, which can lead to a loss of tracking and catastrophic crashes of the controlled vehicle. To this end, we propose an effective data augmentation approach, which allows to train a network that is capable of handling different views of the target vehicle. During the pursuit, the target vehicle is firstly localized using a Convolutional Neural Network. The network takes a single RGB image along with cars’ velocities and estimates target vehicle’s pose with respect to the ego-vehicle. This information is then fed to a Multi-Layer Perceptron, which regresses the control commands for the ego-vehicle, namely throttle and steering angle. We extensively validate our approach using the CARLA simulator on a wide range of terrains. Our method demonstrates real-time performance, robustness to different scenarios including unseen trajectories and high route completion.

MCML Authors

Mariia Gladkova

Computer Vision & Artificial Intelligence

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[651]

T. Beker, H. Ansari, S. Montazeri, Q. Song and X. Zhu.
Deep Learning for Subtle Volcanic Deformation Detection With InSAR Data in Central Volcanic Zone.
IEEE Transactions on Geoscience and Remote Sensing 61 (Oct. 2023). DOI

Abstract

Subtle volcanic deformations point to volcanic activities, and monitoring them helps predict eruptions. Today, it is possible to remotely detect volcanic deformation in mm/year scale thanks to advances in interferometric synthetic aperture radar (InSAR). This article proposes a framework based on a deep learning model to automatically discriminate subtle volcanic deformations from other deformation types in five-year-long InSAR stacks. Models are trained on a synthetic training set. To better understand and improve the models, explainable artificial intelligence (AI) analyses are performed. In initial models, Gradient-weighted Class Activation Mapping (Grad-CAM) linked new-found patterns of slope processes and salt lake deformations to false-positive detections. The models are then improved by fine-tuning (FT) with a hybrid synthetic-real data, and additional performance is extracted by low-pass spatial filtering (LSF) of the real test set. The t-distributed stochastic neighbor embedding (t-SNE) latent feature visualization confirmed the similarity and shortcomings of the FT set, highlighting the problem of elevation components in residual tropospheric noise. After fine-tuning, all the volcanic deformations are detected, including the smallest one, Lazufre, deforming 5 mm/year. The first time confirmed deformation of Cerro El Condor is observed, deforming 9.9–17.5 mm/year. Finally, sensitivity analysis uncovered the model’s minimal detectable deformation of 2 mm/year.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[650]

S. Chen, Y. Shi, Z. Xiong and X. Zhu.
HTC-DC Net: Monocular Height Estimation From Single Remote Sensing Images.
IEEE Transactions on Geoscience and Remote Sensing 61 (Oct. 2023). DOI GitHub

Abstract

Three-dimensional geoinformation is of great significance for understanding the living environment; however, 3-D perception from remote sensing data, especially on a large scale, is restricted, mainly due to the high costs of 3-D sensors such as light detection and ranging (LiDAR). To tackle this problem, we propose a method for monocular height estimation from optical imagery, which is currently one of the richest sources of remote sensing data. As an ill-posed problem, monocular height estimation requires well-designed networks for enhanced representations to improve the performance. Moreover, the distribution of height values is long-tailed with the low-height pixels, e.g., the background (BG), as the head, and thus, trained networks are usually biased and tend to underestimate building heights. To solve the problems, instead of formalizing the problem as a regression task, we propose HTC-DC Net following the classification–regression paradigm, with the head-tail cut (HTC) and the distribution-based constraints (DCs) as the main contributions. HTC-DC Net is composed of the backbone network as the feature extractor, the HTC-AdaBins module, and the hybrid regression process. The HTC-AdaBins module serves as the classification phase to determine bins adaptive to each input image. It is equipped with a vision transformer (ViT) encoder to incorporate local context with holistic information and involves an HTC to address the long-tailed problem in monocular height estimation for balancing the performances of foreground (FG) and BG pixels. The hybrid regression process does the regression via the smoothing of bins from the classification phase, which is trained via DCs. The proposed network is tested on three datasets of different resolutions, namely ISPRS Vaihingen (0.09 m), Data Fusion Contest 19 (DFC19) (1.3 m), and Global Building Height (GBH) (3 m). The experimental results show the superiority of the proposed network over existing methods by large margins. Extensive ablation studies demonstrate the effectiveness of each design component.

MCML Authors

Sining Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[649]

F. Zhou, X. Sun, C. Sun, J. Dong and X. Zhu.
Adaptive Morphology Filter: A Lightweight Module for Deep Hyperspectral Image Classification.
IEEE Transactions on Geoscience and Remote Sensing 61 (Oct. 2023). DOI GitHub

Abstract

Deep neural network models significantly outperform classical algorithms in the hyperspectral image (HSI) classification task. These deep models improve generalization but incur significant computational demands. This article endeavors to alleviate the computational distress in a depthwise manner through the use of morphological operations. We propose the adaptive morphology filter (AMF) to effectively extract spatial features like the conventional depthwise convolution layer. Furthermore, we reparameterize AMF into its equivalent form, i.e., a traditional binary morphology filter, which drastically reduces the number of parameters in the inference phase. Finally, we stack multiple AMFs to achieve a large receptive field and construct a lightweight AMNet for classifying HSIs. It is noteworthy that we prove the deep stack of depthwise AMFs to be equivalent to structural element decomposition. We test our model on five benchmark datasets. Experiments show that our approach outperforms state-of-the-art methods with fewer parameters (≈10k).

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

B3 | Multimodal Perception
→ Group Matthias Althoff

Data Science in Earth Observation

[648]

J. Külz, M. Mayer and M. Althoff.
Timor Python: A Toolbox for Industrial Modular Robotics.
IROS 2023 - IEEE/RSJ International Conference on Intelligent Robots and Systems. Detroit, MI, USA, Oct 01-05, 2023. DOI

Abstract

Modular Reconfigurable Robots (MRRs) represent an exciting path forward for industrial robotics, opening up new possibilities for robot design. Compared to monolithic manipulators, they promise greater flexibility, improved maintainability, and cost-efficiency. However, there is no tool or standardized way to model and simulate assemblies of modules in the same way it has been done for robotic manipulators for decades. We introduce the Toolbox for Industrial Modular Robotics (Timor), a Python toolbox to bridge this gap and integrate modular robotics into existing simulation and optimization pipelines. Our open-source library offers model generation and task-based configuration optimization for MRRs. It can easily be integrated with existing simulation tools - not least by offering URDF export of arbitrary modular robot assemblies. Moreover, our experimental study demonstrates the effectiveness of Timor as a tool for designing modular robots optimized for specific use cases.

MCML Authors

Jonathan Külz

Cyber Physical Systems

Matthias Althoff

Prof. Dr.

Cyber Physical Systems

[647]

Y. R. Shrestha, G. von Krogh and S. Feuerriegel.
Building open-source AI.
Nature Computational Science 3.11 (Oct. 2023). DOI

Abstract

Artificial intelligence (AI) drives innovation across society, economies and science. We argue for the importance of building AI technology according to open-source principles to foster accessibility, collaboration, responsibility and interoperability.
The computer science community has a long tradition of embracing open-source principles. However, companies increasingly restrict access to AI innovations. An example is OpenAI, which was founded to make scientific research openly available but which eventually restricted access to research findings. Although such a strategy reflects a company’s legitimate incentive to obtain financial returns, such protection increases concentration of power, restricting access to AI technology. Further down the road, concentrated power could lead to growing inequality in AI research, education and public use. Here we discuss why proprietary AI technology should be complemented by open-source AI across the essential components for building AI technology: datasets, source codes and models.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[646]

F. Bongratz, A.-M. Rickmann and C. Wachinger.
Abdominal organ segmentation via deep diffeomorphic mesh deformations.
Scientific Reports 13.1 (Oct. 2023). DOI

Abstract

Abdominal organ segmentation from CT and MRI is an essential prerequisite for surgical planning and computer-aided navigation systems. It is challenging due to the high variability in the shape, size, and position of abdominal organs. Three-dimensional numeric representations of abdominal shapes with point-wise correspondence to a template are further important for quantitative and statistical analyses thereof. Recently, template-based surface extraction methods have shown promising advances for direct mesh reconstruction from volumetric scans. However, the generalization of these deep learning-based approaches to different organs and datasets, a crucial property for deployment in clinical environments, has not yet been assessed. We close this gap and employ template-based mesh reconstruction methods for joint liver, kidney, pancreas, and spleen segmentation. Our experiments on manually annotated CT and MRI data reveal limited generalization capabilities of previous methods to organs of different geometry and weak performance on small datasets. We alleviate these issues with a novel deep diffeomorphic mesh-deformation architecture and an improved training scheme. The resulting method, UNetFlow, generalizes well to all four organs and can be easily fine-tuned on new data. Moreover, we propose a simple registration-based post-processing that aligns voxel and mesh outputs to boost segmentation accuracy.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[645]

J. Smids, H. Berkers, P. Le Blanc, S. Rispens and S. Nyholm.
Employers Have a Duty of Beneficence to Design for Meaningful Work: A General Argument and Logistics Warehouses as a Case Study.
The Journal of Ethics (Oct. 2023). DOI

Abstract

Artificial intelligence-driven technology increasingly shapes work practices and, accordingly, employees’ opportunities for meaningful work (MW). In our paper, we identify five dimensions of MW: pursuing a purpose, social relationships, exercising skills and self-development, autonomy, self-esteem and recognition. Because MW is an important good, lacking opportunities for MW is a serious disadvantage. Therefore, we need to know to what extent employers have a duty to provide this good to their employees. We hold that employers have a duty of beneficence to design for opportunities for MW when implementing AI-technology in the workplace. We argue that this duty of beneficence is supported by the three major ethical theories, namely, Kantian ethics, consequentialism, and virtue ethics. We defend this duty against two objections, including the view that it is incompatible with the shareholder theory of the firm. We then employ the five dimensions of MW as our analytical lens to investigate how AI-based technological innovation in logistic warehouses has an impact, both positively and negatively, on MW, and illustrate that design for MW is feasible. We further support this practical feasibility with the help of insights from organizational psychology. We end by discussing how AI-based technology has an impact both on meaningful work (often seen as an aspirational goal) and decent work (generally seen as a matter of justice). Accordingly, ethical reflection on meaningful and decent work should become more integrated to do justice to how AI-technology inevitably shapes both simultaneously.

MCML Authors

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[644]

J. Gauss, F. Scheipl and M. Herrmann.
DCSI–An improved measure of cluster separability based on separation and connectedness.
Preprint (Oct. 2023). arXiv

Abstract

Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness, and neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate them. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted Rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not correspond to meaningful density-based clusters.

MCML Authors

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Functional Data Analysis

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

[643]

Y. Ma, D. Frauen, V. Melnychuk and S. Feuerriegel.
Counterfactual Fairness for Predictions using Generative Adversarial Networks.
Preprint (Oct. 2023). arXiv

Abstract

Fairness in predictions is of direct importance in practice due to legal, ethical, and societal reasons. It is often achieved through counterfactual fairness, which ensures that the prediction for an individual is the same as that in a counterfactual world under a different sensitive attribute. However, achieving counterfactual fairness is challenging as counterfactuals are unobservable. In this paper, we develop a novel deep neural network called Generative Counterfactual Fairness Network (GCFN) for making predictions under counterfactual fairness. Specifically, we leverage a tailored generative adversarial network to directly learn the counterfactual distribution of the descendants of the sensitive attribute, which we then use to enforce fair predictions through a novel counterfactual mediator regularization. If the counterfactual distribution is learned sufficiently well, our method is mathematically guaranteed to ensure the notion of counterfactual fairness. Thereby, our GCFN addresses key shortcomings of existing baselines that are based on inferring latent variables, yet which (a) are potentially correlated with the sensitive attributes and thus lead to bias, and (b) have weak capability in constructing latent representations and thus low prediction performance. Across various experiments, our method achieves state-of-the-art performance. Using a real-world case study from recidivism prediction, we further demonstrate that our method makes meaningful predictions in practice.

MCML Authors

Yuchen Ma

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[642]

Y. Shen, R. Liao, Z. Han, Y. Ma and V. Tresp.
GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language Models.
Preprint (Oct. 2023). arXiv

Abstract

While multi-modal models have successfully integrated information from image, video, and audio modalities, integrating graph modality into large language models (LLMs) remains unexplored. This discrepancy largely stems from the inherent divergence between structured graph data and unstructured text data. Incorporating graph knowledge provides a reliable source of information, enabling potential solutions to address issues in text generation, e.g., hallucination, and lack of domain knowledge. To evaluate the integration of graph knowledge into language models, a dedicated dataset is needed. However, there is currently no benchmark dataset specifically designed for multimodal graph-language models. To address this gap, we propose GraphextQA, a question answering dataset with paired subgraphs, retrieved from Wikidata, to facilitate the evaluation and future development of graph-language models. Additionally, we introduce a baseline model called CrossGNN, which conditions answer generation on the paired graphs by cross-attending question-aware graph features at decoding. The proposed dataset is designed to evaluate graph-language models’ ability to understand graphs and make use of it for answer generation. We perform experiments with language-only models and the proposed graph-language model to validate the usefulness of the paired graphs and to demonstrate the difficulty of the task.

MCML Authors

Ruotong Liao

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Database Systems and Data Mining

[641]

Q. Xu, Y. Shi, J. Bamber, Y. Tuo, R. Ludwig and X. Zhu.
Physics-aware Machine Learning Revolutionizes Scientific Paradigm for Machine Learning and Process-based Hydrology.
Preprint (Oct. 2023). arXiv GitHub

Abstract

Accurate hydrological understanding and water cycle prediction are crucial for addressing scientific and societal challenges associated with the management of water resources, particularly under the dynamic influence of anthropogenic climate change. Existing reviews predominantly concentrate on the development of machine learning (ML) in this field, yet there is a clear distinction between hydrology and ML as separate paradigms. Here, we introduce physics-aware ML as a transformative approach to overcome the perceived barrier and revolutionize both fields. Specifically, we present a comprehensive review of the physics-aware ML methods, building a structured community (PaML) of existing methodologies that integrate prior physical knowledge or physics-based modeling into ML. We systematically analyze these PaML methodologies with respect to four aspects: physical data-guided ML, physics-informed ML, physics-embedded ML, and physics-aware hybrid learning. PaML facilitates ML-aided hypotheses, accelerating insights from big data and fostering scientific discoveries. We first conduct a systematic review of hydrology in PaML, including rainfall-runoff hydrological processes and hydrodynamic processes, and highlight the most promising and challenging directions for different objectives and PaML methods. Finally, a new PaML-based hydrology platform, termed HydroPML, is released as a foundation for hydrological applications. HydroPML enhances the explainability and causality of ML and lays the groundwork for the digital water cycle’s realization.

MCML Authors

Qingsong Xu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Data Science in Earth Observation

[640]

L. Bothmann, S. Dandl and M. Schomaker.
Causal Fair Machine Learning via Rank-Preserving Interventional Distributions.
AEQUITAS @ECAI 2023 - 1st Workshop on Fairness and Bias in AI co-located with the 26th European Conference on Artificial Intelligence (ECAI 2023). Kraków, Poland, Sep 30-Oct 04, 2023. PDF

Abstract

A decision can be defined as fair if equal individuals are treated equally and unequals unequally. Adopting this definition, the task of designing machine learning models that mitigate unfairness in automated decision-making systems must include causal thinking when introducing protected attributes. Following a recent proposal, we define individuals as being normatively equal if they are equal in a fictitious, normatively desired (FiND) world, where the protected attribute has no (direct or indirect) causal effect on the target. We propose rank-preserving interventional distributions to define an estimand of this FiND world and a warping method for estimation. Evaluation criteria for both the method and resulting model are presented and validated through simulations and empirical data. With this, we show that our warping approach effectively identifies the most discriminated individuals and mitigates unfairness.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michael Schomaker

Prof. Dr.

Biostatistics

[639]

D. Winkel, N. Strauß, M. Schubert and T. Seidl.
Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning.
ECAI 2023 - 26th European Conference on Artificial Intelligence. Kraków, Poland, Sep 30-Oct 04, 2023. DOI

Abstract

Portfolio optimization tasks describe sequential decision problems in which the investor’s wealth is distributed across a set of assets. Allocation constraints are used to enforce minimal or maximal investments into particular subsets of assets to control for objectives such as limiting the portfolio’s exposure to a certain sector due to environmental concerns. Although methods for (CRL) can optimize policies while considering allocation constraints, it can be observed that these general methods yield suboptimal results. In this paper, we propose a novel approach to handle allocation constraints based on a decomposition of the constraint action space into a set of unconstrained allocation problems. In particular, we examine this approach for the case of two constraints. For example, an investor may wish to invest at least a certain percentage of the portfolio into green technologies while limiting the investment in the fossil energy sector. We show that the action space of the task is equivalent to the decomposed action space, and introduce a new (RL) approach CAOSD, which is built on top of the decomposition. The experimental evaluation on real-world Nasdaq data demonstrates that our approach consistently outperforms state-of-the-art CRL benchmarks for portfolio optimization.

MCML Authors

David Winkel

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[638]

J. Herbinger, S. Dandl, F. K. Ewald, S. Loibl and G. Casalicchio.
Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation.
XI-ML @ECAI 2023 - 3rd International Workshop on Explainable and Interpretable Machine Learning co-located with the 26th European Conference on Artificial Intelligence (ECAI 2023). Kraków, Poland, Sep 30-Oct 04, 2023. DOI

Abstract

Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models via model distillation. This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules. Within each region, interpretable models based on additive main effects are used to approximate the behavior of the black box model, striking for an optimal balance between interpretability and performance. Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models. We investigate fidelity, interpretability, stability, and the algorithms’ capability to capture interaction effects through appropriate splits. Based on our comprehensive analyses, we finally provide an overview of user-specific recommendations.

MCML Authors

Julia Herbinger

Dr.

* Former Member

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Fiona Katharina Ewald

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[637]

L. Haliburton, B. Pirker, P. Holinski, A. Schmidt, P. W. Wozniak and M. Hoppe.
VR-Hiking: Physical Exertion Benefits Mindfulness and Positive Emotions in Virtual Reality.
MobileHCI 2023 - ACM International Conference on Mobile Human-Computer Interaction. Athens, Greece, Sep 26-29, 2023. DOI

Abstract

Exploring the great outdoors offers physical and mental health benefits. Hiking is healthy, provides a sense of accomplishment, and offers an opportunity to relax. However, a nature trip is not always possible, and there is a lack of evidence showing how these beneficial experiences can be replicated in Virtual Reality (VR). In response, we recruited (N=24) participants to explore a virtual mountain landscape in a within-subjects study with different levels of exertion: walking, using a chairlift, and teleporting. We found that physical exertion when walking produced significantly more positive emotions and mindfulness than other conditions. Our research shows that physically demanding outdoor activities in VR can be beneficial for the user and that the achievement of hiking up a virtual mountain on a treadmill positively impacts wellbeing. We demonstrate how physical exertion can be used to add mindfulness and positive affect to VR experiences and discuss consequences for VR designers.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[636]

Y. Ma, Q. Khan and D. Cremers.
Multi Agent Navigation in Unconstrained Environments Using a Centralized Attention Based Graphical Neural Network Controller.
ITSC 2023 - 26th IEEE International Conference on Intelligent Transportation. Bilbao, Spain, Sep 24-28, 2023. DOI GitHub

Abstract

In this work, we propose a learning based neural model that provides both the longitudinal and lateral control commands to simultaneously navigate multiple vehicles. The goal is to ensure that each vehicle reaches a desired target state without colliding with any other vehicle or obstacle in an unconstrained environment. The model utilizes an attention based Graphical Neural Network paradigm that takes into consideration the state of all the surrounding vehicles to make an informed decision. This allows each vehicle to smoothly reach its destination while also evading collision with the other agents. The data and corresponding labels for training such a network is obtained using an optimization based procedure. Experimental results demonstrate that our model is powerful enough to generalize even to situations with more vehicles than in the training data. Our method also outperforms comparable graphical neural network architectures.

MCML Authors

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[635]

J. Schmidt, Q. Khan and D. Cremers.
LiDAR View Synthesis for Robust Vehicle Navigation Without Expert Labels.
ITSC 2023 - 26th IEEE International Conference on Intelligent Transportation. Bilbao, Spain, Sep 24-28, 2023. DOI GitHub

Abstract

Deep learning models for self-driving cars require a diverse training dataset to manage critical driving scenarios on public roads safely. This includes having data from divergent trajectories, such as the oncoming traffic lane or sidewalks. Such data would be too dangerous to collect in the real world. Data augmentation approaches have been proposed to tackle this issue using RGB images. However, solutions based on LiDAR sensors are scarce. Therefore, we propose synthesizing additional LiDAR point clouds from novel viewpoints without physically driving at dangerous positions. The LiDAR view synthesis is done using mesh reconstruction and ray casting. We train a deep learning model, which takes a LiDAR scan as input and predicts the future trajectory as output. A waypoint controller is then applied to this predicted trajectory to determine the throttle and steering labels of the ego-vehicle. Our method neither requires expert driving labels for the original nor the synthesized LiDAR sequence. Instead, we infer labels from LiDAR odometry. We demonstrate the effectiveness of our approach in a comprehensive online evaluation and with a comparison to concurrent work. Our results show the importance of synthesizing additional LiDAR point clouds, particularly in terms of model robustness.

MCML Authors

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[634]

S. Urchs, V. Thurner, M. Aßenmacher, C. Heumann and S. Thiemichen.
How Prevalent is Gender Bias in ChatGPT? - Exploring German and English ChatGPT Responses.
BDCA @ECML-PKDD 2023 - 1st Workshop on Biased Data in Conversational Agents at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

With the introduction of ChatGPT, OpenAI made large language models (LLM) accessible to users with limited IT expertise. However, users with no background in natural language processing (NLP) might lack a proper understanding of LLMs. Thus the awareness of their inherent limitations, and therefore will take the systems’ output at face value. In this paper, we systematically analyse prompts and the generated responses to identify possible problematic issues with a special focus on gender biases, which users need to be aware of when processing the system’s output. We explore how ChatGPT reacts in English and German if prompted to answer from a female, male, or neutral perspective. In an in-depth investigation, we examine selected prompts and analyse to what extent responses differ if the system is prompted several times in an identical way. On this basis, we show that ChatGPT is indeed useful for helping non-IT users draft texts for their daily work. However, it is absolutely crucial to thoroughly check the system’s responses for biases as well as for syntactic and grammatical mistakes.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[633]

I. T. Öztürk, R. Nedelchev, C. Heumann, E. Garces Arias, M. Roger, B. Bischl and M. Aßenmacher.
How Different Is Stereotypical Bias Across Languages?
BIAS @ECML-PKDD 2023 - 3rd Workshop on Bias and Fairness in AI at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.

MCML Authors

Esteban Garces Arias

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[632]

P. Becker and V. Bengs.
Shapley-Based Feature Selection for Online Algorithm Selection.
DynXAI @ECML-PKDD 2023 - Workshop on Explainable Artificial Intelligence: From Static to Dynamic at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Online algorithm selection concerns the task of designing a dynamic algorithm selector that observes sequentially arriving problem instances of an algorithmic problem class for which it must select a suitable algorithm from a given pool of candidate algorithms. Typically the suitability of a candidate algorithm is determined by its average runtime for solving a problem instance. Recent work has shown that multi-armed bandit algorithms can be leveraged for specifying a suitable algorithm selector by taking available feature information of problem instances into account. In this paper, we investigate whether the performance of these bandit-based selection strategies can be further improved by incorporating feature selection. To this end, we use the concept of Shapley values from cooperative game theory to specify the contribution of the features with respect to the suitability of the candidate algorithms and adapt the bandit-based selection strategies to consider only features with the highest contribution. We present two different Shapley value-based approaches and show empirically that UCB-based bandit selection strategies can be improved, while Thompson sampling-based strategies actually deteriorate in terms of average runtime.

MCML Authors

Viktor Bengs

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[631]

S. Dandl, G. Casalicchio, B. Bischl and L. Bothmann.
Interpretable Regional Descriptors: Hyperbox-Based Local Explanations.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

This work introduces interpretable regional descriptors, or IRDs, for local, model-agnostic interpretations. IRDs are hyperboxes that describe how an observation’s feature values can be changed without affecting its prediction. They justify a prediction by providing a set of “even if” arguments (semi-factual explanations), and they indicate which features affect a prediction and whether pointwise biases or implausibilities exist. A concrete use case shows that this is valuable for both machine learning modelers and persons subject to a decision. We formalize the search for IRDs as an optimization problem and introduce a unifying framework for computing IRDs that covers desiderata, initialization techniques, and a post-processing method. We show how existing hyperbox methods can be adapted to fit into this unified framework. A benchmark study compares the methods based on several quality measures and identifies two strategies to improve IRDs.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[630]

Z. Ding, J. Wu, Z. Li, Y. Ma and V. Tresp.
Improving Few-Shot Inductive Learning on Temporal Knowledge Graphs Using Confidence-Augmented Reinforcement Learning.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI GitHub

Abstract

Temporal knowledge graph completion (TKGC) aims to predict the missing links among the entities in a temporal knowledge graph (TKG). Most previous TKGC methods only consider predicting the missing links among the entities seen in the training set, while they are unable to achieve great performance in link prediction concerning newly-emerged unseen entities. Recently, a new task, i.e., TKG few-shot out-of-graph (OOG) link prediction, is proposed, where TKGC models are required to achieve great link prediction performance concerning newly-emerged entities that only have few-shot observed examples. In this work, we propose a TKGC method FITCARL that combines few-shot learning with reinforcement learning to solve this task. In FITCARL, an agent traverses through the whole TKG to search for the prediction answer. A policy network is designed to guide the search process based on the traversed path. To better address the data scarcity problem in the few-shot setting, we introduce a module that computes the confidence of each candidate action and integrate it into the policy for action selection. We also exploit the entity concept information with a novel concept regularizer to boost model performance. Experimental results show that FITCARL achieves stat-of-the-art performance on TKG few-shot OOG link prediction.

MCML Authors

Zifeng Ding

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[629]

S. Gilhuber, J. Busch, D. Rotthues, C. M. M. Frey and T. Seidl.
DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Node classification is one of the core tasks on attributed graphs, but successful graph learning solutions require sufficiently labeled data. To keep annotation costs low, active graph learning focuses on selecting the most qualitative subset of nodes that maximizes label efficiency. However, deciding which heuristic is best suited for an unlabeled graph to increase label efficiency is a persistent challenge. Existing solutions either neglect aligning the learned model and the sampling method or focus only on limited selection aspects. They are thus sometimes worse or only equally good as random sampling. In this work, we introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings. Toward better transferability between different graph structures, we combine three independent scoring functions to identify the most informative node samples for labeling in a parameter-free way: i) Model Uncertainty, ii) Diversity Component, and iii) Node Importance computed via graph diffusion heuristics. Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria and similarly fast as simpler heuristics. Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.

MCML Authors

Sandra Gilhuber (née Obermeier)

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Christian Frey

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[628]

S. Gilhuber, R. Hvingelby, M. L. A. Fok and T. Seidl.
How to Overcome Confirmation Bias in Semi-Supervised Image Classification by Active Learning.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Do we need active learning? The rise of strong deep semi-supervised methods raises doubt about the usability of active learning in limited labeled data settings. This is caused by results showing that combining semi-supervised learning (SSL) methods with a random selection for labeling can outperform existing active learning (AL) techniques. However, these results are obtained from experiments on well-established benchmark datasets that can overestimate the external validity. However, the literature lacks sufficient research on the performance of active semi-supervised learning methods in realistic data scenarios, leaving a notable gap in our understanding. Therefore we present three data challenges common in real-world applications: between-class imbalance, within-class imbalance, and between-class similarity. These challenges can hurt SSL performance due to confirmation bias. We conduct experiments with SSL and AL on simulated data challenges and find that random sampling does not mitigate confirmation bias and, in some cases, leads to worse performance than supervised learning. In contrast, we demonstrate that AL can overcome confirmation bias in SSL in these realistic settings. Our results provide insights into the potential of combining active and semi-supervised learning in the presence of common real-world challenges, which is a promising direction for robust methods when learning with limited labeled data in real-world applications.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[627]

S. Haas and E. Hüllermeier.
Rectifying Bias in Ordinal Observational Data Using Unimodal Label Smoothing.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

This paper proposes a novel approach for modeling observational data in the form of expert ratings, which are commonly given on an ordered (numerical or ordinal) scale. In practice, such ratings are often biased, due to the expert’s preferences, psychological effects, etc. Our approach aims to rectify these biases, thereby preventing machine learning methods from transferring them to models trained on the data. To this end, we make use of so-called label smoothing, which allows for redistributing probability mass from the originally observed rating to other ratings, which are considered as possible corrections. This enables the incorporation of domain knowledge into the standard cross-entropy loss and leads to flexibly configurable models. Concretely, our method is realized for ordinal ratings and allows for arbitrary unimodal smoothings using a binary smoothing relation. Additionally, the paper suggests two practically motivated smoothing heuristics to address common biases in observational data, a time-based smoothing to handle concept drift and a class-wise smoothing based on class priors to mitigate data imbalance. The effectiveness of the proposed methods is demonstrated on four real-world goodwill assessment data sets of a car manufacturer with the aim of automating goodwill decisions. Overall, this paper presents a promising approach for modeling ordinal observational data that can improve decision-making processes and reduce reliance on human expertise.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[626]

M. Klein, C. Leiber and C. Böhm.
k-SubMix: Common Subspace Clustering on Mixed-Type Data.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Clustering heterogeneous data is an ongoing challenge in the data mining community. The most prevalent clustering methods are designed to process datasets with numerical features only, but often datasets consist of mixed numerical and categorical features. This requires new approaches capable of handling both kinds of data types. Further, the most relevant cluster structures are often hidden in only a few features. Thus, another key challenge is to detect those specific features automatically and abandon features not relevant for clustering. This paper proposes the subspace mixed-type clustering algorithm k-SubMix, which tackles both challenges. Its cost function can handle both numerical and categorical features while simultaneously identifying those with the biggest impact for a high-quality clustering result. Unlike other subspace mixed-type clustering methods, k-SubMix preserves inter-cluster comparability, as it is the first mixed-type approach that defines a common subspace for all clusters. Extensive experiments show that k-SubMix outperforms competitive methods and reduces the data’s complexity by a simultaneous dimensionality reduction.

MCML Authors

Mauritius Klein

Dr.

* Former Member

Collin Leiber

* Former Member

Christian Böhm

Prof. Dr.

* Former Member

[625]

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalization of SAGE, which is able to react to changes in the model as well as to drift in the data-generating process. We further provide efficient feature removal methods that break (interventional) and retain (observational) feature dependencies. Moreover, we formally analyze our explanation method to show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we evaluate our approach in a thorough experimental analysis based on well-established data sets and data streams with concept drift.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[624]

L. Rauch, M. Aßenmacher, D. Huseljic, M. Wirth, B. Bischl and B. Sick.
ActiveGLAE: A Benchmark for Deep Active Learning with Transformers.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Deep active learning (DAL) seeks to reduce annotation costs by enabling the model to actively query instance annotations from which it expects to learn the most. Despite extensive research, there is currently no standardized evaluation protocol for transformer-based language models in the field of DAL. Diverse experimental settings lead to difficulties in comparing research and deriving recommendations for practitioners. To tackle this challenge, we propose the ACTIVEGLAE benchmark, a comprehensive collection of data sets and evaluation guidelines for assessing DAL. Our benchmark aims to facilitate and streamline the evaluation process of novel DAL strategies. Additionally, we provide an extensive overview of current practice in DAL with transformer-based language models. We identify three key challenges - data set selection, model training, and DAL settings - that pose difficulties in comparing query strategies. We establish baseline results through an extensive set of experiments as a reference point for evaluating future work. Based on our findings, we provide guidelines for researchers and practitioners.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[623]

J. G. Wiese, L. Wimmer, T. Papamarkou, B. Bischl, S. Günnemann and D. Rügamer.
Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. Best paper award. DOI

Abstract

Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. Such coarse approximations can be detrimental in practical applications, notably safety-critical ones. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. These symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[622]

E. Terzieva, M. Muschalik, P. Hofman and E. Hüllermeier.
Identifying Trends in Feature Attributions During Training of Neural Networks.
ECML-PKDD 2023 - Workshop Uncertainty meets Explainability in Machine Learning at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

This study investigates the evolving dynamics of commonly used feature attribution (FA) values during training of neural networks. As models transition from a state of high uncertainty to low uncertainty, we show that the features’ significance also changes, which is inline with the general learning theory of deep neural networks. During model training, we compute FA scores through Layer-wise Relevance Propagation (LRP) and Gradient-weighted Class Activation Mapping (Grad-CAM), which are selected for their efficiency and speed of computation. We summarize the attribution scores in terms of the sum of the absolute values of FA scores and their entropy. We further analyze these summary scores in relation to the models’ generalization capabilities. The analysis identifies trends where FA values increase in magnitude while entropy decreases during the training process, regardless of model generalization, suggesting independence of overfitting. This research offers a unique view on the application of FA methods in explainable artificial intelligence (XAI) and raises intriguing questions about their behavior across varying model architectures and datasets, which may have implications for future work combining XAI and uncertainty estimation in machine learning.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[621]

T. Kaufmann, S. Ball, J. Beck, E. Hüllermeier and F. Kreuter.
On the challenges and practices of reinforcement learning from real human feedback.
HLDM @ECML-PKDD 2023 - 1st Workshop on Hybrid Human-Machine Learning and Decision Making at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that does not require an engineered reward function but instead learns from human feedback. Due to its increasing popularity, various authors have studied how to learn an accurate reward model from only few samples, making optimal use of this feedback. Because of the cost and complexity of user studies, however, this research is often conducted with synthetic human feedback. Such feedback can be generated by evaluating behavior based on ground-truth rewards which are available for some benchmark tasks. While this setting can help evaluate some aspects of RLHF, it differs from practical settings in which synthetic feedback is not available. Working with real human feedback brings additional challenges that cannot be observed with synthetic feedback, including fatigue, inter-rater inconsistencies, delay, misunderstandings, and modality-dependent difficulties. We describe and discuss some of these challenges together with current practices and opportunities for further research in this paper.

MCML Authors

Timo Kaufmann

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence and Machine Learning

Sarah Ball

Social Data Science and AI

Jacob Beck

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Frauke Kreuter

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Social Data Science and AI

[620]

M. Aßenmacher, L. Rauch, J. Goschenhofer, A. Stephan, B. Bischl, B. Roth and B. Sick.
Towards Enhancing Deep Active Learning with Weak Supervision and Constrained Clustering.
IAL @ECML-PKDD 2023 - 7th International Workshop on Interactive Adaptive Learning at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. PDF

Abstract

Three fields revolving around the question of how to cope with limited amounts of labeled data are Deep Active Learning (DAL), deep Constrained Clustering (CC), and Weakly Supervised Learning (WSL). DAL tackles the problem by adaptively posing the question of which data samples to annotate next in order to achieve the best incremental learning improvement, although it suffers from several limitations that hinder its deployment in practical settings. We point out how CC algorithms and WSL could be employed to overcome these limitations and increase the practical applicability of DAL research. Specifically, we discuss the opportunities to use the class discovery capabilities of CC and the possibility of further reducing human annotation efforts by utilizing WSL. We argue that the practical applicability of DAL algorithms will benefit from employing CC and WSL methods for the learning and labeling process. We inspect the overlaps between the three research areas and identify relevant and exciting research questions at the intersection of these areas.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

Jann Goschenhofer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Statistical Learning and Data Science

[619]

B. Ma, E. Nie, H. Schmid and H. Schütze.
Is Prompt-Based Finetuning Always Better than Vanilla Finetuning? Insights from Cross-Lingual Language Understanding.
KONVENS 2023 - 19th Conference on Natural Language Processing. Ingolstadt, Germany, Sep 18-22, 2023. URL

Abstract

Multilingual pretrained language models (MPLMs) have demonstrated substantial performance improvements in zero-shot cross-lingual transfer across various natural language understanding tasks by finetuning MPLMs on task-specific labelled data of a source language (e.g. English) and evaluating on a wide range of target languages. Recent studies show that prompt-based finetuning surpasses regular finetuning in few-shot scenarios. However, the exploration of prompt-based learning in multilingual tasks remains limited. In this study, we propose the PROFIT pipeline to investigate the cross-lingual capabilities of Prompt-based Finetuning. We conduct comprehensive experiments on diverse cross-lingual language understanding tasks (sentiment classification, paraphrase identification, and natural language inference) and empirically analyze the variation trends of prompt-based finetuning performance in cross-lingual transfer across different few-shot and full-data settings. Our results reveal the effectiveness and versatility of prompt-based finetuning in cross-lingual language understanding. Our findings indicate that prompt-based finetuning outperforms vanilla finetuning in full-data scenarios and exhibits greater advantages in few-shot scenarios, with different performance patterns dependent on task types. Additionally, we analyze underlying factors such as language similarity and pretraining data size that impact the cross-lingual performance of prompt-based finetuning. Overall, our work provides valuable insights into the cross-lingual prowess of prompt-based finetuning.

MCML Authors

Bolei Ma

Social Data Science and AI

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Computational Linguistics

[618]

F. Hoppe, C. M. Verdun, H. Laus, F. Krahmer and H. Rauhut.
Uncertainty Quantification For Learned ISTA.
MLSP 2023 - IEEE Workshop on Machine Learning for Signal Processing. Rome, Italy, Sep 17-20, 2023. DOI

Abstract

Model-based deep learning solutions to inverse problems have attracted increasing attention in recent years as they bridge state-of-the-art numerical performance with interpretability. In addition, the incorporated prior domain knowledge can make the training more efficient as the smaller number of parameters allows the training step to be executed with smaller datasets. Algorithm unrolling schemes stand out among these model-based learning techniques. Despite their rapid advancement and their close connection to traditional high-dimensional statistical methods, they lack certainty estimates and a theory for uncertainty quantification is still elusive. This work provides a step towards closing this gap proposing a rigorous way to obtain confidence intervals for the LISTA estimator.

MCML Authors

Claudio Mayrink Verdun

Dr.

* Former Member

Hannah Laus

A2 | Mathematical Foundations
→ Group Felix Krahmer

Optimization & Data Analysis

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[617]

Ç. Yapar, F. Jaensch, R. Ron, G. Kutyniok and G. Caire.
Overview of the Urban Wireless Localization Competition.
MLSP 2023 - IEEE Workshop on Machine Learning for Signal Processing. Rome, Italy, Sep 17-20, 2023. DOI

Abstract

In dense urban environments, Global Navigation Satellite Systems do not provide good accuracy due to the low probability of line-of-sight (LOS) between the user equipment (UE) to be located and the satellites due to the presence of obstacles such as buildings. As a result, it is necessary to resort to other technologies that can operate reliably under non-line-of-sight (NLOS) conditions. To promote research in the reviving field of radio map-based wireless localization, we have launched the MLSP 2023 Urban Wireless Localization Competition. In this short overview paper, we describe the urban wireless localization problem, the provided datasets and baseline methods, the challenge task, and the challenge evaluation methodology. Finally, we present the results of the challenge.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[616]

A. Javanmardi, Y. Sale, P. Hofman and E. Hüllermeier.
Conformal Prediction with Partially Labeled Data.
COPA 2023 - 12th Symposium on Conformal and Probabilistic Prediction with Applications. Limassol, Cyprus, Sep 13-15, 2023. URL

Abstract

While the predictions produced by conformal prediction are set-valued, the data used for training and calibration is supposed to be precise. In the setting of superset learning or learning from partial labels, a variant of weakly supervised learning, it is exactly the other way around: training data is possibly imprecise (set-valued), but the model induced from this data yields precise predictions. In this paper, we combine the two settings by making conformal prediction amenable to set-valued training data. We propose a generalization of the conformal prediction procedure that can be applied to set-valued training and calibration data. We prove the validity of the proposed method and present experimental studies in which it compares favorably to natural baselines.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[615]

S. F. Fischer, L. Harutyunyan, M. Feurer and B. Bischl.
OpenML-CTR23 - A curated tabular regression benchmarking suite.
AutoML 2023 - International Conference on Automated Machine Learning - Workshop Track. Berlin, Germany, Sep 12-15, 2023. URL

Abstract

Benchmark experiments are one of the cornerstones of modern machine learning research. An essential part in the design of such experiments is the selection of datasets. We present the OpenML Curated Tabular Regression benchmarking suite 2023 (OpenML-CTR23). It is available on OpenML and comprises 35 regression problems that have been selected according to a set of strict criteria. We compare its design with existing regression benchmark suites and also challenge some of the dataset choices of previous efforts. As a first experiment, we compare five machine learning methods of varying complexity on the OpenML-CTR23.

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[614]

L. O. Purucker, L. Schneider, M. Anastacio, J. Beel, B. Bischl and H. Hoos.
Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML.
AutoML 2023 - International Conference on Automated Machine Learning. Berlin, Germany, Sep 12-15, 2023. URL

Abstract

Automated machine learning (AutoML) systems commonly ensemble models post hoc to improve predictive performance, typically via greedy ensemble selection (GES). However, we believe that GES may not always be optimal, as it performs a simple deterministic greedy search. In this work, we introduce two novel population-based ensemble selection methods, QO-ES and QDO-ES, and compare them to GES. While QO-ES optimises solely for predictive performance, QDO-ES also considers the diversity of ensembles within the population, maintaining a diverse set of well-performing ensembles during optimisation based on ideas of quality diversity optimisation. The methods are evaluated using 71 classification datasets from the AutoML benchmark, demonstrating that QO-ES and QDO-ES often outrank GES, albeit only statistically significant on validation data. Our results further suggest that diversity can be beneficial for post hoc ensembling but also increases the risk of overfitting.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[613]

S. Segel, H. Graf, A. Tornede, B. Bischl and M. Lindauer.
Symbolic Explanations for Hyperparameter Optimization.
AutoML 2023 - International Conference on Automated Machine Learning. Berlin, Germany, Sep 12-15, 2023. URL

Abstract

Hyperparameter optimization (HPO) methods can determine well-performing hyperparameter configurations efficiently but often lack insights and transparency. We propose to apply symbolic regression to meta-data collected with Bayesian optimization (BO) during HPO. In contrast to prior approaches explaining the effects of hyperparameters on model performance, symbolic regression allows for obtaining explicit formulas quantifying the relation between hyperparameter values and model performance. Overall, our approach aims to make the HPO process more explainable and human-centered, addressing the needs of multiple user groups: First, providing insights into the HPO process can support data scientists and machine learning practitioners in their decisions when using and interacting with HPO tools. Second, obtaining explicit formulas and inspecting their properties could help researchers understand the HPO loss landscape better. In an experimental evaluation, we find that naively applying symbolic regression directly to meta-data collected during HPO is affected by the sampling bias introduced by BO. However, the true underlying loss landscape can be approximated by fitting the symbolic regression on the surrogate model trained during BO. By penalizing longer formulas, symbolic regression furthermore allows the user to decide how to balance the accuracy and explainability of the resulting formulas.

MCML Authors

Bernd Bischl

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Statistical Learning and Data Science

[612]

A. Maronikolakis, P. O’Grady, H. Schütze and M. Lyra.
Improving Few-Shot Learning with Multilingual Transfer and Monte Carlo Training Set Selection.
LSD 2023 - CLASP Conference on Learning with Small Data. Gothenburg, Sweden, Sep 11-12, 2023. URL

Abstract

In industry settings, machine learning is an attractive tool to automatize processes. Unfortunately, annotated and high-quality data is expensive to source. This problem is exacerbated in settings spanning multiple markets and languages. Thus, developing solutions for multilingual tasks with little available data is challenging. Few-shot learning is a compelling approach when building solutions in multilingual and low-resource settings, since the method not only requires just a few training examples to achieve high performance, but is also a technique agnostic to language. Even though the technique can be applied to multilingual settings, optimizing performance is an open question. In our work we show that leveraging higher-resource, task-specific language data can boost overall performance and we propose a method to select training examples per their average performance in a Monte Carlo simulation, resulting in a training set more conducive to learning. We demonstrate the effectiveness of our methods in fashion text reviews moderation, classifying reviews as related or unrelated to the given product. We show that our methodology boosts performance in multilingual (English, French, German) settings, increasing F1 score and significantly decreasing false positives.

MCML Authors

Antonis Maronikolakis

* Former Member

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[611]

P. Koch, G. V. Nuñez, E. Garces Arias, C. Heumann, M. Schöffel, A. Häberlin and M. Aßenmacher.
A tailored Handwritten-Text-Recognition System for Medieval Latin.
ALP @RANLP 2023 - 1st Workshop on Ancient Language Processing co-located with the Conference on Recent Advances in Natural Language Processing (RANLP 2023). Varna, Bulgaria, Sep 08, 2023. URL

Abstract

The Bavarian Academy of Sciences and Humanities aims to digitize the Medieval Latin Dictionary. This dictionary entails record cards referring to lemmas in medieval Latin, a low-resource language. A crucial step of the digitization process is the handwritten text recognition (HTR) of the handwritten lemmas on the record cards. In our work, we introduce an end-to-end pipeline, tailored for the medieval Latin dictionary, for locating, extracting, and transcribing the lemmas. We employ two state-of-the-art image segmentation models to prepare the initial data set for the HTR task. Further, we experiment with different transformer-based models and conduct a set of experiments to explore the capabilities of different combinations of vision encoders with a GPT-2 decoder. Additionally, we also apply extensive data augmentation resulting in a highly competitive model. The best-performing setup achieved a character error rate of 0.015, which is even superior to the commercial Google Cloud Vision model, and shows more stable performance.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[610]

E. Nie, H. Schmid and H. Schütze.
Cross-Lingual Constituency Parsing for Middle High German: A Delexicalized Approach.
ALP @RANLP 2023 - 1st Workshop on Ancient Language Processing co-located with the Conference on Recent Advances in Natural Language Processing (RANLP 2023). Varna, Bulgaria, Sep 08, 2023. URL

Abstract

Constituency parsing plays a fundamental role in advancing natural language processing (NLP) tasks. However, training an automatic syntactic analysis system for ancient languages solely relying on annotated parse data is a formidable task due to the inherent challenges in building treebanks for such languages. It demands extensive linguistic expertise, leading to a scarcity of available resources. To overcome this hurdle, cross-lingual transfer techniques which require minimal or even no annotated data for low-resource target languages offer a promising solution. In this study, we focus on building a constituency parser for Middle High German (MHG) under realistic conditions, where no annotated MHG treebank is available for training. In our approach, we leverage the linguistic continuity and structural similarity between MHG and Modern German (MG), along with the abundance of MG treebank resources. Specifically, by employing the delexicalization method, we train a constituency parser on MG parse datasets and perform cross-lingual transfer to MHG parsing. Our delexicalized constituency parser demonstrates remarkable performance on the MHG test set, achieving an F1-score of 67.3%. It outperforms the best zero-shot cross-lingual baseline by a margin of 28.6% points. The encouraging results underscore the practicality and potential for automatic syntactic analysis in other ancient languages that face similar challenges as MHG.

MCML Authors

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Computational Linguistics

[609]

V. Hangya and A. Fraser.
LMU at HaSpeeDe3: Multi-Dataset Training for Cross-Domain Hate Speech Detection.
EVALITA 2023 - Final Workshop of the 8th evaluation campaign. Parma, Italy, Sep 07-08, 2023. PDF

Abstract

We describe LMU Munich’s hate speech detection system for participating in the cross-domain track of the HaSpeeDe3 shared task at EVALITA 2023. The task focuses on the politics and religion domains, having no in-domain training data for the latter. Our submission combines multiple training sets from various domains in a multitask prompt-training system. We experimented with both Italian and English source datasets as well as monolingual Italian and multilingual pre-trained language models. We found that the Italian out-of-domain datasets are the most influential on the performance in the test domains and that combining both monolingual and multilingual language models using an ensemble gives the best results. Our system ranked second in both domains.

MCML Authors

Viktor Hangya

Dr.

* Former Member

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[608]

S. Nyholm.
Is Academic Enhancement Possible by Means of Generative Ai-Based Digital Twins?
American Journal of Bioethics 23.10 (Sep. 2023). DOI

MCML Authors

Sven Nyholm

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Ethics of Artificial Intelligence

[607]

H. A. Gündüz, M. Binder, X.-Y. To, R. Mreches, B. Bischl, A. C. McHardy, P. C. Münch and M. Rezaei.
A self-supervised deep learning method for data-efficient training in genomics.
Communications Biology 6.928 (Sep. 2023). DOI

Abstract

Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.

MCML Authors

Hüseyin Anil Gündüz

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Xiao-Yin To

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[606]

D. Bär, N. Pröllochs and S. Feuerriegel.
New Threats to Society from Free-Speech Social Media Platforms.
Communications of the ACM 66.10 (Sep. 2023). DOI

Abstract

Understanding emerging threats from social media platforms.

MCML Authors

Dominik Bär

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[605]

M. Toetzke, B. Probst and S. Feuerriegel.
Leveraging large language models to monitor climate technology innovation.
Environmental Research Letters 18.9 (Sep. 2023). DOI

Abstract

To achieve net-zero emissions, public policy needs to foster rapid innovation of climate technologies. However, there is a scarcity of comprehensive and up-to-date evidence to guide policymaking by monitoring climate innovation systems. This is notable, especially at the center of the innovation process, where nascent inventions transition into profitable and scalable market solutions. Here, we discuss the potential of large language models (LLMs) to monitor climate technology innovation. By analyzing large pools of unstructured text data sources, such as company reports and social media, LLMs can automate information retrieval processes and thereby improve existing monitoring in terms of cost-effectiveness, timeliness, and comprehensiveness. In this perspective, we show how LLMs can play a crucial role in informing innovation policy for the energy transition by highlighting promising use cases and prevailing challenges for research and policy.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[604]

B. X. W. Liew, F. M. Kovacs, D. Rügamer and A. Royuela.
Automatic variable selection algorithms in prognostic factor research in neck pain.
Journal of Clinical Medicine (Sep. 2023). DOI

Abstract

This study aims to compare the variable selection strategies of different machine learning (ML) and statistical algorithms in the prognosis of neck pain (NP) recovery. A total of 3001 participants with NP were included. Three dichotomous outcomes of an improvement in NP, arm pain (AP), and disability at 3 months follow-up were used. Twenty-five variables (twenty-eight parameters) were included as predictors. There were more parameters than variables, as some categorical variables had >2 levels. Eight modelling techniques were compared: stepwise regression based on unadjusted p values (stepP), on adjusted p values (stepPAdj), on Akaike information criterion (stepAIC), best subset regression (BestSubset) least absolute shrinkage and selection operator [LASSO], Minimax concave penalty (MCP), model-based boosting (mboost), and multivariate adaptive regression splines (MuARS). The algorithm that selected the fewest predictors was stepPAdj (number of predictors, p = 4 to 8). MuARS was the algorithm with the second fewest predictors selected (p = 9 to 14). The predictor selected by all algorithms with the largest coefficient magnitude was “having undergone a neuroreflexotherapy intervention” for NP (β = from 1.987 to 2.296) and AP (β = from 2.639 to 3.554), and “Imaging findings: spinal stenosis” (β = from −1.331 to −1.763) for disability. Stepwise regression based on adjusted p-values resulted in the sparsest models, which enhanced clinical interpretability. MuARS appears to provide the optimal balance between model sparsity whilst retaining high predictive performance across outcomes. Different algorithms produced similar performances but resulted in a different number of variables selected. Rather than relying on any single algorithm, confidence in the variable selection may be increased by using multiple algorithms.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[603]

S. Hoffmann, F. Scheipl and A.-L. Boulesteix.
Reproduzierbare und replizierbare Forschung.
Moderne Verfahren der Angewandten Statistik (Sep. 2023). DOI

Abstract

In den letzten Jahren haben Berichte über die fehlende Replizierbarkeit und Reproduzierbarkeit von Forschungsergebnissen viel Aufmerksamkeit erhalten und dazu geführt, dass die Art und Weise, wie wissenschaftliche Studien geplant, analysiert und berichtet werden, hinterfragt wird. Bei der statistischen Planung und Auswertung wissenschaftlicher Studien muss eine Vielzahl von Entscheidungen getroffen werden, ohne dass es dabei eindeutig richtige oder falsche Wahlmöglichkeiten gäbe. Hier wird erläutert, wie diese Multiplizität an möglichen Analysestrategien, die durch Modell-, Datenaufbereitungs- und Methodenunsicherheit beschrieben werden kann, in Verbindung mit selektiver Berichterstattung zu Ergebnissen führen kann, die sich auf unabhängigen Daten nicht replizieren lassen. Zudem werden Lösungsstrategien vorgestellt, mit denen die Replizierbarkeit der Ergebnisse verbessert werden kann, und Praktiken und Hilfsmittel vorgestellt, mit denen durchgeführte Analysen reproduzierbar werden können.

MCML Authors

Fabian Scheipl

PD Dr.

Functional Data Analysis

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[602]

A. Bacho, H. Boche and G. Kutyniok.
Complexity Blowup for Solutions of the Laplace and the Diffusion Equation.
Preprint (Sep. 2023). arXiv

Abstract

In this paper, we investigate the computational complexity of solutions to the Laplace and the diffusion equation. We show that for a certain class of initial-boundary value problems of the Laplace and the diffusion equation, the solution operator is #P1/#P-complete in the sense that it maps polynomial-time computable functions to the set of #P1/#P-complete functions. Consequently, there exists polynomial-time (Turing) computable input data such that the solution is not polynomial-time computable, unless FP=#P or FP1=#P1. In this case, we can, in general, not simulate the solution of the Laplace or the diffusion equation on a digital computer without having a complexity blowup, i.e., the computation time for obtaining an approximation of the solution with up to a finite number of significant digits grows non-polynomially in the number of digits. This indicates that the computational complexity of the solution operator that models a physical phenomena is intrinsically high, independent of the numerical algorithm that is used to approximate a solution.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[601]

F. Hoppe, F. Krahmer, C. M. Verdun, M. I. Menzel and H. Rauhut.
Uncertainty quantification for sparse Fourier recovery.
Preprint (Sep. 2023). arXiv

Abstract

One of the most prominent methods for uncertainty quantification in high-dimen-sional statistics is the desparsified LASSO that relies on unconstrained ℓ1-minimization. The majority of initial works focused on real (sub-)Gaussian designs. However, in many applications, such as magnetic resonance imaging (MRI), the measurement process possesses a certain structure due to the nature of the problem. The measurement operator in MRI can be described by a subsampled Fourier matrix. The purpose of this work is to extend the uncertainty quantification process using the desparsified LASSO to design matrices originating from a bounded orthonormal system, which naturally generalizes the subsampled Fourier case and also allows for the treatment of the case where the sparsity basis is not the standard basis. In particular we construct honest confidence intervals for every pixel of an MR image that is sparse in the standard basis provided the number of measurements satisfies n≳max{slog2slogp,slog2p} or that is sparse with respect to the Haar Wavelet basis provided a slightly larger number of measurements.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Optimization & Data Analysis

Claudio Mayrink Verdun

Dr.

* Former Member

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[600]

Y. Shan, Y. Xia, Y. Chen and D. Cremers.
SCP: Scene Completion Pre-training for 3D Object Detection.
Preprint (Sep. 2023). arXiv

Abstract

3D object detection using LiDAR point clouds is a fundamental task in the fields of computer vision, robotics, and autonomous driving. However, existing 3D detectors heavily rely on annotated datasets, which are both time-consuming and prone to errors during the process of labeling 3D bounding boxes. In this paper, we propose a Scene Completion Pre-training (SCP) method to enhance the performance of 3D object detectors with less labeled data. SCP offers three key advantages: (1) Improved initialization of the point cloud model. By completing the scene point clouds, SCP effectively captures the spatial and semantic relationships among objects within urban environments. (2) Elimination of the need for additional datasets. SCP serves as a valuable auxiliary network that does not impose any additional efforts or data requirements on the 3D detectors. (3) Reduction of the amount of labeled data for detection. With the help of SCP, the existing state-of-the-art 3D detectors can achieve comparable performance while only relying on 20% labeled data.

MCML Authors

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[599]

R. P. Prager, K. Dietrich, L. Schneider, L. Schäpermeier, B. Bischl, P. Kerschke, H. Trautmann and O. Mersmann.
Neural Networks as Black-Box Benchmark Functions Optimized for Exploratory Landscape Features.
FOGA 2023 - 17th ACM/SIGEVO Conference on Foundations of Genetic Algorithms. Potsdam, Germany, Aug 30-Sep 01, 2023. DOI

Abstract

Artificial benchmark functions are commonly used in optimization research because of their ability to rapidly evaluate potential solutions, making them a preferred substitute for real-world problems. However, these benchmark functions have faced criticism for their limited resemblance to real-world problems. In response, recent research has focused on automatically generating new benchmark functions for areas where established test suites are inadequate. These approaches have limitations, such as the difficulty of generating new benchmark functions that exhibit exploratory landscape analysis (ELA) features beyond those of existing benchmarks. The objective of this work is to develop a method for generating benchmark functions for single-objective continuous optimization with user-specified structural properties. Specifically, we aim to demonstrate a proof of concept for a method that uses an ELA feature vector to specify these properties in advance. To achieve this, we begin by generating a random sample of decision space variables and objective values. We then adjust the objective values using CMA-ES until the corresponding features of our new problem match the predefined ELA features within a specified threshold. By iteratively transforming the landscape in this way, we ensure that the resulting function exhibits the desired properties. To create the final function, we use the resulting point cloud as training data for a simple neural network that produces a function exhibiting the target ELA features. We demonstrate the effectiveness of this approach by replicating the existing functions of the well-known BBOB suite and creating new functions with ELA feature values that are not present in BBOB.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[598]

A. Scheppach, H. A. Gündüz, E. Dorigatti, P. C. Münch, A. C. McHardy, B. Bischl, M. Rezaei and M. Binder.
Neural Architecture Search for Genomic Sequence Data.
CIBCB 2023 - 20th IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology. Eindhoven, The Netherlands, Aug 29-31, 2023. DOI

Abstract

Deep learning has enabled outstanding progress on bioinformatics datasets and a variety of tasks, such as protein structure prediction, identification of regulatory regions, genome annotation, and interpretation of the noncoding genome. The layout and configuration of neural networks used for these tasks have mostly been developed manually by human experts, which is a time-consuming and error-prone process. Therefore, there is growing interest in automated neural architecture search (NAS) methods in bioinformatics. In this paper, we present a novel search space for NAS algorithms that operate on genome data, thus creating extensions for existing NAS algorithms for sequence data that we name Genome-DARTS, Genome-P-DARTS, Genome-BONAS, Genome-SH, and Genome-RS. Moreover, we introduce two novel NAS algorithms, CWP-DARTS and EDPDARTS, that build on and extend the idea of P-DARTS. We evaluate the presented methods and compare them to manually designed neural architectures on a widely used genome sequence machine learning task to show that NAS methods can be adapted well for bioinformatics sequence datasets. Our experiments show that architectures optimized by our NAS methods outperform manually developed architectures while having significantly fewer parameters.

MCML Authors

Hüseyin Anil Gündüz

* Former Member

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[597]

L. Rottkamp, N. Strauß and M. Schubert.
DEAR: Dynamic Electric Ambulance Redeployment.
SSTD 2023 - 18th International Symposium on Spatial and Temporal Databases. Calgary, Canada, Aug 23-25, 2023. DOI

Abstract

Dynamic Ambulance Redeployment (DAR) is the task of dynamically assigning ambulances after incidents to base stations to minimize future response times. Though DAR has attracted considerable attention from the research community, existing solutions do not consider using electric ambulances despite the global shift towards electric mobility. In this paper, we are the first to examine the impact of electric ambulances and their required downtime for recharging to DAR and demonstrate that using policies for conventional vehicles can lead to a significant increase in either the number of required ambulances or in the response time to emergencies. Therefore, we propose a new redeployment policy that considers the remaining energy levels, the recharging stations’ locations, and the required recharging time. Our new method is based on minimizing energy deficits (MED) and can provide well-performing redeployment decisions in the novel Dynamic Electric Ambulance Redeployment problem (DEAR). We evaluate MED on a simulation using real-world emergency data from the city of San Francisco and show that MED can provide the required service level without additional ambulances in most cases. For DEAR, MED outperforms various established state-of-the-art solutions for conventional DAR and straightforward solutions to this setting.

MCML Authors

Lukas Rottkamp

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Niklas Strauß

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[596]

Z. Liu, Y. Ma, H. Li, M. Hildebrandt, Y. Ouyang and Z. Xiong.
Debiased Contrastive Loss for Collaborative Filtering.
KSEM 2023 - 16th International Conference Knowledge Science, Engineering and Management. Guangzhou, China, Aug 16-18, 2023. DOI

Abstract

Collaborative filtering (CF) is the most fundamental technique in recommender systems, which reveals user preference by implicit feedback. Generally, binary cross-entropy or bayesian personalized ranking are usually employed as the loss function to optimize model parameters. Recently, the sampled softmax loss has been proposed to enhance the sampling efficiency, which adopts an in-batch sample strategy. However, it suffers from the sample bias issue, which unavoidably introduces false negative instances, resulting inaccurate representations of users’ genuine interests. To address this problem, we propose a debiased contrastive loss, incorporating a bias correction probability to alleviate the sample bias. We integrate the proposed method into several matrix factorizations (MF) and graph neural network-based (GNN) recommendation models. Besides, we theoretically analyze the effectiveness of our methods in automatically mining the hard negative instances. Experimental results on three public benchmarks demonstrate that the proposed debiased contrastive loss can augment several existing MF and GNN-based CF models and outperform popular learning objectives in the recommendation. Additionally, we demonstrate that our method substantially enhances training efficiency.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining

[595]

A. Khakzar.
Rethinking Feature Attribution for Neural Network Explanation.
Dissertation 2023. URL

Abstract

Feature attribution is arguably the predominant approach for illuminating black-box neural networks. This dissertation rethinks feature attribution by leveraging critical neural pathways, identifying input features with predictive information, and evaluating feature attribution using the neural network model. The dissertation also rethinks feature attribution for the explanation of medical imaging models.

MCML Authors

Ashkan Khakzar

Dr.

* Former Member

[594]

A. Beer, A. Draganov, E. Hohma, P. Jahn, C. M. M. Frey and I. Assent.
Connecting the Dots — Density-Connectivity Distance unifies DBSCAN, k-Center and Spectral Clustering.
KDD 2023 - 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Long Beach, CA, USA, Aug 06-10, 2023. DOI GitHub

Abstract

Despite the popularity of density-based clustering, its procedural definition makes it difficult to analyze compared to clustering methods that minimize a loss function. In this paper, we reformulate DBSCAN through a clean objective function by introducing the density-connectivity distance (dc-dist), which captures the essence of density-based clusters by endowing the minimax distance with the concept of density. This novel ultrametric allows us to show that DBSCAN, k-center, and spectral clustering are equivalent in the space given by the dc-dist, despite these algorithms being perceived as fundamentally different in their respective literatures. We also verify that finding the pairwise dc-dists gives DBSCAN clusterings across all epsilon-values, simplifying the problem of parameterizing density-based clustering. We conclude by thoroughly analyzing density-connectivity and its properties – a task that has been elusive thus far in the literature due to the lack of formal tools.

MCML Authors

Anna Beer

Dr.

* Former Member

Philipp Jahn

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Christian Frey

Dr.

* Former Member

[593]

M. Windl, A. Scheidle, C. George and S. Mayer.
Investigating Security Indicators for Hyperlinking Within the Metaverse.
SOUPS 2023 - 19th Symposium on Usable Privacy and Security. Anaheim, CA, USA, Aug 06-08, 2023. URL

Abstract

Security indicators, such as the padlock icon indicating SSL encryption in browsers, are established mechanisms to convey secure connections. Currently, such indicators mainly exist for browsers and mobile environments. With the rise of the metaverse, we investigate how to mark secure transitions between applications in virtual reality to so-called sub-metaverses. For this, we first conducted in-depth interviews with domain experts (N=8) to understand the general design dimensions for security indicators in virtual reality (VR). Using these insights and considering additional design constraints, we implemented the five most promising indicators and evaluated them in a user study (N=25). While the visual blinking indicator placed in the periphery performed best regarding accuracy and task completion time, participants subjectively preferred the static visual indicator above the portal. Moreover, the latter received high scores regarding understandability while still being rated low regarding intrusiveness and disturbance. Our findings contribute to a more secure and enjoyable metaverse experience.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

* Former Member

[592]

M. Caprio, Y. Sale, E. Hüllermeier and I. Lee.
A Novel Bayes' Theorem for Upper Probabilities.
Epi UAI 2023 - International Workshop on Epistemic Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Aug 04, 2023. DOI

Abstract

In their seminal 1990 paper, Wasserman and Kadane establish an upper bound for the Bayes’ posterior probability of a measurable set A, when the prior lies in a class of probability measures and the likelihood is precise. They also give a sufficient condition for such upper bound to hold with equality. In this paper, we introduce a generalization of their result by additionally addressing uncertainty related to the likelihood. We give an upper bound for the posterior probability when both the prior and the likelihood belong to a set of probabilities. Furthermore, we give a sufficient condition for this upper bound to become an equality. This result is interesting on its own, and has the potential of being applied to various fields of engineering (e.g. model predictive control), machine learning, and artificial intelligence.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Artificial Intelligence and Machine Learning

[591]

S. Endt, M. Engel, E. Naldi, R. Assereto, M. Molendowska, L. Mueller, C. M. Verdun, C. M. Pirkl, M. Palombo, D. K. Jones and M. I. Menzel.
In vivo myelin water quantification using diffusion--relaxation correlation MRI: A comparison of 1D and 2D methods.
Applied Magnetic Resonance 54 (Aug. 2023). DOI

Abstract

Multidimensional Magnetic Resonance Imaging (MRI) is a versatile tool for microstructure mapping. We use a diffusion weighted inversion recovery spin echo (DW-IR-SE) sequence with spiral readouts at ultra-strong gradients to acquire a rich diffusion–relaxation data set with sensitivity to myelin water. We reconstruct 1D and 2D spectra with a two-step convex optimization approach and investigate a variety of multidimensional MRI methods, including 1D multi-component relaxometry, 1D multi-component diffusometry, 2D relaxation correlation imaging, and 2D diffusion-relaxation correlation spectroscopic imaging (DR-CSI), in terms of their potential to quantify tissue microstructure, including the myelin water fraction (MWF). We observe a distinct spectral peak that we attribute to myelin water in multi-component T1 relaxometry, T1-T2 correlation, T1-D correlation, and T2-D correlation imaging. Due to lower achievable echo times compared to diffusometry, MWF maps from relaxometry have higher quality. Whilst 1D multi-component T1 data allows much faster myelin mapping, 2D approaches could offer unique insights into tissue microstructure and especially myelin diffusion.

MCML Authors

Claudio Mayrink Verdun

Dr.

* Former Member

[590]

F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Auxiliary Cross-Modal Representation Learning With Triplet Loss Functions for Online Handwriting Recognition.
IEEE Access 11 (Aug. 2023). DOI

Abstract

Cross-modal representation learning learns a shared embedding between two or more modalities to improve performance in a given task compared to using only one of the modalities. Cross-modal representation learning from different data types - such as images and time-series data (e.g., audio or text data) – requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the contrastive or triplet loss, which uses positive and negative identities to create sample pairs with different labels, for cross-modal representation learning between image and time-series modalities (CMR-IS). By adapting the triplet loss for cross-modal representation learning, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. We present a triplet loss with a dynamic margin for single label and sequence-to-sequence classification tasks. We perform extensive evaluations on synthetic image and time-series data, and on data for offline handwriting recognition (HWR) and on online HWR from sensor-enhanced pens for classifying written words. Our experiments show an improved classification accuracy, faster convergence, and better generalizability due to an improved cross-modal representation. Furthermore, the more suitable generalizability leads to a better adaptability between writers for online HWR.

MCML Authors

Felix Ott

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[589]

D. Wolffram, S. Abbott, M. an der Heiden, S. Funk, F. Günther, D. Hailer, S. Heyder, T. Hotz, J. van de Kassteele, H. Küchenhoff, S. Müller-Hansen, D. Syliqi, A. Ullrich, M. Weigert, M. Schienle and J. Bracher.
Collaborative nowcasting of COVID-19 hospitalization incidences in Germany.
PLOS Computational Biology 19.8 (Aug. 2023). DOI

Abstract

Real-time surveillance is a crucial element in the response to infectious disease outbreaks. However, the interpretation of incidence data is often hampered by delays occurring at various stages of data gathering and reporting. As a result, recent values are biased downward, which obscures current trends. Statistical nowcasting techniques can be employed to correct these biases, allowing for accurate characterization of recent developments and thus enhancing situational awareness. In this paper, we present a preregistered real-time assessment of eight nowcasting approaches, applied by independent research teams to German 7-day hospitalization incidences during the COVID-19 pandemic. This indicator played an important role in the management of the outbreak in Germany and was linked to levels of non-pharmaceutical interventions via certain thresholds. Due to its definition, in which hospitalization counts are aggregated by the date of case report rather than admission, German hospitalization incidences are particularly affected by delays and can take several weeks or months to fully stabilize. For this study, all methods were applied from 22 November 2021 to 29 April 2022, with probabilistic nowcasts produced each day for the current and 28 preceding days. Nowcasts at the national, state, and age-group levels were collected in the form of quantiles in a public repository and displayed in a dashboard. Moreover, a mean and a median ensemble nowcast were generated. We find that overall, the compared methods were able to remove a large part of the biases introduced by delays. Most participating teams underestimated the importance of very long delays, though, resulting in nowcasts with a slight downward bias. The accompanying prediction intervals were also too narrow for almost all methods. Averaged over all nowcast horizons, the best performance was achieved by a model using case incidences as a covariate and taking into account longer delays than the other approaches. For the most recent days, which are often considered the most relevant in practice, a mean ensemble of the submitted nowcasts performed best. We conclude by providing some lessons learned on the definition of nowcasting targets and practical challenges.

MCML Authors

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

Maximilian Weigert

* Former Member

[588]

P. Heid.
A damped Kačanov scheme for the numerical solution of a relaxed p(x)-Poisson equation.
Partial Differential Equations and Applications 4.40 (Aug. 2023). DOI

Abstract

The focus of the present work is the (theoretical) approximation of a solution of the p(x)-Poisson equation. To devise an iterative solver with guaranteed convergence, we will consider a relaxation of the original problem in terms of a truncation of the nonlinearity from below and from above by using a pair of positive cut-off parameters. We will then verify that, for any such pair, a damped Kačanov scheme generates a sequence converging to a solution of the relaxed equation. Subsequently, it will be shown that the solutions of the relaxed problems converge to the solution of the original problem in the discrete setting. Finally, the discrete solutions of the unrelaxed problem converge to the continuous solution. Our work will finally be rounded up with some numerical experiments that underline the analytical findings.

MCML Authors

Pascal Heid

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

* Former Member

[587]

A. Volkmann, A. Stöcker, F. Scheipl and S. Greven.
Multivariate Functional Additive Mixed Models.
Statistical Modelling 23.4 (Aug. 2023). DOI

Abstract

Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.

MCML Authors

Fabian Scheipl

PD Dr.

Functional Data Analysis

[586]

L. Fahrmeir, G. Kauermann, G. Tutz and M. Windmann.
Spatial smoothing revisited: An application to rental data in Munich.
Statistical Modelling 23.5-6 (Aug. 2023). DOI

Abstract

Spatial smoothing makes use of spatial information to obtain better estimates in regression models. In particular flexible smoothing with B-splines and penalties, which has been propagated by Eilers and Marx (1996), provides strong tools that can be used to include available spatial information. We consider alternative smoothing methods in spatial additive regression and employ them for analysing rental data in Munich. The first method applies tensor product P-splines to the geolocation of apartments, measured on a continuous scale through the centroid of the quarter where an apartment is. The alternative approach exploits the neighbourhood structure of districts on a discrete scale, where districts consist of a set of neighbouring quarters. The discrete modelling approach yields smooth estimates when using ridge-type penalties but can also enforce spatial clustering of districts with a homogeneous structure when using Lasso-type penalties.

MCML Authors

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[585]

F. Pfisterer, S. Wei, S. Vollmer, M. Lang and B. Bischl.
Fairness Audits and Debiasing Using mlr3fairness.
The R Journal 15.1 (Aug. 2023). DOI

Abstract

Given an increase in data-driven automated decision-making based on machine learning (ML) models, it is imperative that, along with tools to develop and improve such models, there are sufficient capabilities to analyze and assess models with respect to potential biases. We present the package mlr3fairness, a collection of metrics and methods that allow for the assessment of bias in machine learning models. Our package implements a variety of widely used fairness metrics that can be used to audit models for potential biases, along with a set of visualizations that can help to provide additional insights into such biases. mlr3fairness furthermore integrates bias mitigation methods for machine learning models through data pre-processing or post-processing of predictions. These allow practitioners to trade off performance and fairness metrics that are appropriate for their use case.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[584]

H.-H. Chou, J. Maly and D. Stöger.
How to induce regularization in linear models: A guide to reparametrizing gradient flow.
Preprint (Aug. 2023). arXiv

Abstract

In this work, we analyze the relation between reparametrizations of gradient flow and the induced implicit bias in linear models, which encompass various basic regression tasks. In particular, we aim at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow. Our results provide conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed. We furthermore show how to use these insights for designing reparametrization functions that lead to specific implicit biases which are closely connected to ℓp- or trigonometric regularizers.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[583]

S. Henzgen and E. Hüllermeier.
Weighting by Tying: A New Approach to Weighted Rank Correlation.
Preprint (Aug. 2023). arXiv

Abstract

Measures of rank correlation are commonly used in statistics to capture the degree of concordance between two orderings of the same set of items. Standard measures like Kendall’s tau and Spearman’s rho coefficient put equal emphasis on each position of a ranking. Yet, motivated by applications in which some of the positions (typically those on the top) are more important than others, a few weighted variants of these measures have been proposed. Most of these generalizations fail to meet desirable formal properties, however. Besides, they are often quite inflexible in the sense of committing to a fixed weighing scheme. In this paper, we propose a weighted rank correlation measure on the basis of fuzzy order relations. Our measure, called scaled gamma, is related to Goodman and Kruskal’s gamma rank correlation. It is parametrized by a fuzzy equivalence relation on the rank positions, which in turn is specified conveniently by a so-called scaling function. This approach combines soundness with flexibility: it has a sound formal foundation and allows for weighing rank positions in a flexible way.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[582]

J. Rodemann, J. Goschenhofer, E. Dorigatti, T. Nagler and T. Augustin.
Approximately Bayes-optimal pseudo-label selection.
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

Abstract

Semi-supervised learning by self-training heavily relies on pseudo-label selection (PLS). This selection often depends on the initial model fit on labeled data. Early overfitting might thus be propagated to the final model by selecting instances with overconfident but erroneous predictions, often referred to as confirmation bias. This paper introduces BPLS, a Bayesian framework for PLS that aims to mitigate this issue. At its core lies a criterion for selecting instances to label: an analytical approximation of the posterior predictive of pseudo-samples. We derive this selection criterion by proving Bayes-optimality of the posterior predictive of pseudo-samples. We further overcome computational hurdles by approximating the criterion analytically. Its relation to the marginal likelihood allows us to come up with an approximation based on Laplace’s method and the Gaussian integral. We empirically assess BPLS on simulated and real-world data. When faced with high-dimensional data prone to overfitting, BPLS outperforms traditional PLS methods.

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[581]

Y. Sale, M. Caprio and E. Hüllermeier.
Is the Volume of a Credal Set a Good Measure for Epistemic Uncertainty?
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

Abstract

Adequate uncertainty representation and quantification have become imperative in various scientific disciplines, especially in machine learning and artificial intelligence. As an alternative to representing uncertainty via one single probability measure, we consider credal sets (convex sets of probability measures). The geometric representation of credal sets as d-dimensional polytopes implies a geometric intuition about (epistemic) uncertainty. In this paper, we show that the volume of the geometric representation of a credal set is a meaningful measure of epistemic uncertainty in the case of binary classification, but less so for multi-class classification. Our theoretical findings highlight the crucial role of specifying and employing uncertainty measures in machine learning in an appropriate way, and for being aware of possible pitfalls.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[580]

L. Wimmer, Y. Sale, P. Hofman, B. Bischl and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty in Machine Learning: Are Conditional Entropy and Mutual Information Appropriate Measures?
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

Abstract

The quantification of aleatoric and epistemic uncertainty in terms of conditional entropy and mutual information, respectively, has recently become quite common in machine learning. While the properties of these measures, which are rooted in information theory, seem appealing at first glance, we identify various incoherencies that call their appropriateness into question. In addition to the measures themselves, we critically discuss the idea of an additive decomposition of total uncertainty into its aleatoric and epistemic constituents. Experiments across different computer vision tasks support our theoretical findings and raise concerns about current practice in uncertainty quantification.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Paul Hofman

Artificial Intelligence and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[579]

A. Stüber, S. Coors and M. Ingrisch.
Revitalize the Potential of Radiomics: Interpretation and Feature Stability in Medical Imaging Analyses through Groupwise Feature Importance.
LB-D-DC @xAI 2023 - Late-breaking Work, Demos and Doctoral Consortium at the 1st World Conference on eXplainable Artificial Intelligence (xAI 2023). Lisbon, Portugal, Jul 26-28, 2023. PDF

Abstract

Radiomics, involving analysis of calculated, quantitative features from medical images with machine learning tools, shares the instability challenge with other high-dimensional data analyses due to variations in the training set. This instability affects model interpretation and feature importance assessment. To enhance stability and interpretability, we introduce grouped feature importance, shedding light on tool limitations and advocating for more reliable radiomics-based analysis methods.

MCML Authors

Stefan Coors

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[578]

M. K. Belaid, R. Bornemann, M. Rabus, R. Krestel and E. Hüllermeier.
Compare-xAI: Toward Unifying Functional Testing Methods for Post-hoc XAI Algorithms into a Multi-dimensional Benchmark.
xAI 2023 - 1st World Conference on eXplainable Artificial Intelligence. Lisbon, Portugal, Jul 26-28, 2023. DOI GitHub

Abstract

In recent years, Explainable AI (xAI) attracted a lot of attention as various countries turned explanations into a legal right. xAI algorithms enable humans to understand the underlying models and explain their behavior, leading to insights through which the models can be analyzed and improved beyond the accuracy metric by, e.g., debugging the learned pattern and reducing unwanted biases. However, the widespread use of xAI and the rapidly growing body of published research in xAI have brought new challenges. A large number of xAI algorithms can be overwhelming and make it difficult for practitioners to choose the correct xAI algorithm for their specific use case. This problem is further exacerbated by the different approaches used to assess novel xAI algorithms, making it difficult to compare them to existing methods. To address this problem, we introduce Compare-xAI, a benchmark that allows for a direct comparison of popular xAI algorithms with a variety of different use cases. We propose a scoring protocol employing a range of functional tests from the literature, each targeting a specific end-user requirement in explaining a model. To make the benchmark results easily accessible, we group the tests into four categories (fidelity, fragility, stability, and stress tests). We present results for 13 xAI algorithms based on 11 functional tests. After analyzing the findings, we derive potential solutions for data science practitioners as workarounds to the found practical limitations. Finally, Compare-xAI is a tentative to unify systematic evaluation and comparison methods for xAI algorithms with a focus on the end-user’s requirements.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[577]

C. Molnar, T. Freiesleben, G. König, J. Herbinger, T. Reisinger, G. Casalicchio, M. N. Wright and B. Bischl.
Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process.
xAI 2023 - 1st World Conference on eXplainable Artificial Intelligence. Lisbon, Portugal, Jul 26-28, 2023. DOI

Abstract

Scientists and practitioners increasingly rely on machine learning to model data and draw conclusions. Compared to statistical modeling approaches, machine learning makes fewer explicit assumptions about data structures, such as linearity. However, their model parameters usually cannot be easily related to the data generating process. To learn about the modeled relationships, partial dependence (PD) plots and permutation feature importance (PFI) are often used as interpretation methods. However, PD and PFI lack a theory that relates them to the data generating process. We formalize PD and PFI as statistical estimators of ground truth estimands rooted in the data generating process. We show that PD and PFI estimates deviate from this ground truth due to statistical biases, model variance and Monte Carlo approximation errors. To account for model variance in PD and PFI estimation, we propose the learner-PD and the learner-PFI based on model refits, and propose corrected variance and confidence interval estimators.

MCML Authors

Gunnar König

Dr.

* Former Member

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[576]

M. Muschalik, F. Fumagalli, R. Jagtani, B. Hammer and E. Hüllermeier.
iPDP: On Partial Dependence Plots in Dynamic Modeling Scenarios.
xAI 2023 - 1st World Conference on eXplainable Artificial Intelligence. Lisbon, Portugal, Jul 26-28, 2023. Best Paper Award. DOI

Abstract

Post-hoc explanation techniques such as the well-established partial dependence plot (PDP), which investigates feature dependencies, are used in explainable artificial intelligence (XAI) to understand black-box machine learning models. While many real-world applications require dynamic models that constantly adapt over time and react to changes in the underlying distribution, XAI, so far, has primarily considered static learning environments, where models are trained in a batch mode and remain unchanged. We thus propose a novel model-agnostic XAI framework called incremental PDP (iPDP) that extends on the PDP to extract time-dependent feature effects in non-stationary learning environments. We formally analyze iPDP and show that it approximates a time-dependent variant of the PDP that properly reacts to real and virtual concept drift. The time-sensitivity of iPDP is controlled by a single smoothing parameter, which directly corresponds to the variance and the approximation error of iPDP in a static learning environment. We illustrate the efficacy of iPDP by showcasing an example application for drift detection and conducting multiple experiments on real-world and synthetic data sets and streams.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[575]

V. Bengs, E. Hüllermeier and W. Waegeman.
On Second-Order Scoring Rules for Epistemic Uncertainty Quantification.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

It is well known that accurate probabilistic predictors can be trained through empirical risk minimisation with proper scoring rules as loss functions. While such learners capture so-called aleatoric uncertainty of predictions, various machine learning methods have recently been developed with the goal to let the learner also represent its epistemic uncertainty, i.e., the uncertainty caused by a lack of knowledge and data. An emerging branch of the literature proposes the use of a second-order learner that provides predictions in terms of distributions on probability distributions. However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners. As a main mathematical tool to prove this result, we introduce the generalised notion of second-order scoring rules.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[574]

M. Biloš, K. Rasul, A. Schneider, Y. Nevmyvaka and S. Günnemann.
Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Temporal data such as time series can be viewed as discretized measurements of the underlying function. To build a generative model for such data we have to model the stochastic process that governs it. We propose a solution by defining the denoising diffusion model in the function space which also allows us to naturally handle irregularly-sampled observations. The forward process gradually adds noise to functions, preserving their continuity, while the learned reverse process removes the noise and returns functions as new samples. To this end, we define suitable noise sources and introduce novel denoising and score-matching models. We show how our method can be used for multivariate probabilistic forecasting and imputation, and how our model can be interpreted as a neural process.

MCML Authors

Stephan Günnemann

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Data Analytics & Machine Learning

[573]

V. Melnychuk, D. Frauen and S. Feuerriegel.
Normalizing Flows for Interventional Density Estimation.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Existing machine learning methods for causal inference usually estimate quantities expressed via the mean of potential outcomes (e.g., average treatment effect). However, such quantities do not capture the full information about the distribution of potential outcomes. In this work, we estimate the density of potential outcomes after interventions from observational data. For this, we propose a novel, fully-parametric deep learning method called Interventional Normalizing Flows. Specifically, we combine two normalizing flows, namely (i) a nuisance flow for estimating nuisance parameters and (ii) a target flow for parametric estimation of the density of potential outcomes. We further develop a tractable optimization objective based on a one-step bias correction for efficient and doubly robust estimation of the target flow parameters. As a result, our Interventional Normalizing Flows offer a properly normalized density estimator. Across various experiments, we demonstrate that our Interventional Normalizing Flows are expressive and highly effective, and scale well with both sample size and high-dimensional confounding. To the best of our knowledge, our Interventional Normalizing Flows are the first proper fully-parametric, deep learning method for density estimation of potential outcomes.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[572]

T. Nagler.
Statistical Foundations of Prior-Data Fitted Networks.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning. Instead of training the network to an observed training set, a fixed model is pre-trained offline on small, simulated training sets from a variety of tasks. The pre-trained model is then used to infer class probabilities in-context on fresh training sets with arbitrary size and distribution. Empirically, PFNs achieve state-of-the-art performance on tasks with similar size to the ones used in pre-training. Surprisingly, their accuracy further improves when passed larger data sets during inference. This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior. While PFNs are motivated by Bayesian ideas, a purely frequentistic interpretation of PFNs as pre-tuned, but untrained predictors explains their behavior. A predictor’s variance vanishes if its sensitivity to individual training samples does and the bias vanishes only if it is appropriately localized around the test feature. The transformer architecture used in current PFN implementations ensures only the former. These findings shall prove useful for designing architectures with favorable empirical behavior.

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[571]

D. Rügamer.
A New PHO-rmula for Improved Performance of Semi-Structured Networks.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Recent advances to combine structured regression models and deep neural networks for better interpretability, more expressiveness, and statistically valid uncertainty quantification demonstrate the versatility of semi-structured neural networks (SSNs). We show that techniques to properly identify the contributions of the different model components in SSNs, however, lead to suboptimal network estimation, slower convergence, and degenerated or erroneous predictions. In order to solve these problems while preserving favorable model properties, we propose a non-invasive post-hoc orthogonalization (PHO) that guarantees identifiability of model components and provides better estimation and prediction quality. Our theoretical findings are supported by numerical experiments, a benchmark comparison as well as a real-world application to COVID-19 infections.

MCML Authors

David Rügamer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Ulrich Bauer

Statistics, Data Science and Machine Learning

[570]

N. Stucki, J. C. Paetzold, S. Shit, B. Menze and U. Bauer.
Topologically faithful image segmentation via induced matching of persistence barcodes.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL GitHub

Abstract

Segmentation models predominantly optimize pixel-overlap-based loss, an objective that is actually inadequate for many segmentation tasks. In recent years, their limitations fueled a growing interest in topology-aware methods, which aim to recover the topology of the segmented structures. However, so far, existing methods only consider global topological properties, ignoring the need to preserve topological features spatially, which is crucial for accurate segmentation. We introduce the concept of induced matchings from persistent homology to achieve a spatially correct matching between persistence barcodes in a segmentation setting. Based on this concept, we define the Betti matching error as an interpretable, topologically and feature-wise accurate metric for image segmentations, which resolves the limitations of the Betti number error. Our Betti matching error is differentiable and efficient to use as a loss function. We demonstrate that it improves the topological performance of segmentation networks significantly across six diverse datasets while preserving the performance with respect to traditional scores.

MCML Authors

Nico Stucki

Applied Topology and Geometry

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry

[569]

C. Tomani, F. K. Waseda, Y. Shen and D. Cremers.
Beyond In-Domain Scenarios: Robust Density-Aware Calibration.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Calibrating deep learning models to yield uncertainty-aware predictions is crucial as deep neural networks get increasingly deployed in safety-critical applications. While existing post-hoc calibration methods achieve impressive results on in-domain test datasets, they are limited by their inability to yield reliable uncertainty estimates in domain-shift and out-of-domain (OOD) scenarios. We aim to bridge this gap by proposing DAC, an accuracy-preserving as well as Density-Aware Calibration method based on k-nearest-neighbors (KNN). In contrast to existing post-hoc methods, we utilize hidden layers of classifiers as a source for uncertainty-related information and study their importance. We show that DAC is a generic method that can readily be combined with state-of-the-art post-hoc methods. DAC boosts the robustness of calibration performance in domain-shift and OOD, while maintaining excellent in-domain predictive uncertainty estimates. We demonstrate that DAC leads to consistently better calibration across a large number of model architectures, datasets, and metrics. Additionally, we show that DAC improves calibration substantially on recent large-scale neural networks pre-trained on vast amounts of data.

MCML Authors

Christian Tomani

Computer Vision & Artificial Intelligence

Yuesong Shen

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[568]

T. Wollschläger, N. Gao, B. Charpentier, M. A. Ketata and S. Günnemann.
Uncertainty Estimation for Molecules: Desiderata and Methods.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Graph Neural Networks (GNNs) are promising surrogates for quantum mechanical calculations as they establish unprecedented low errors on collections of molecular dynamics (MD) trajectories. Thanks to their fast inference times they promise to accelerate computational chemistry applications. Unfortunately, despite low in-distribution (ID) errors, such GNNs might be horribly wrong for out-of-distribution (OOD) samples. Uncertainty estimation (UE) may aid in such situations by communicating the model’s certainty about its prediction. Here, we take a closer look at the problem and identify six key desiderata for UE in molecular force fields, three ’physics-informed’ and three ’application-focused’ ones. To overview the field, we survey existing methods from the field of UE and analyze how they fit to the set desiderata. By our analysis, we conclude that none of the previous works satisfies all criteria. To fill this gap, we propose Localized Neural Kernel (LNK) a Gaussian Process (GP)-based extension to existing GNNs satisfying the desiderata. In our extensive experimental evaluation, we test four different UE with three different backbones across two datasets. In out-of-equilibrium detection, we find LNK yielding up to 2.5 and 2.1 times lower errors in terms of AUC-ROC score than dropout or evidential regression-based methods while maintaining high predictive performance.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[567]

S. Alberti, N. Dern, L. Thesing and G. Kutyniok.
Sumformer: Universal Approximation for Efficient Transformers.
TAG-ML @ICML 2023 - 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning at the 40th International Conference on Machine Learning (ICML 2023). Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Natural language processing (NLP) made an impressive jump with the introduction of Transformers. ChatGPT is one of the most famous examples, changing the perception of the possibilities of AI even outside the research community. However, besides the impressive performance, the quadratic time and space complexity of Transformers with respect to sequence length pose significant limitations for handling long sequences. While efficient Transformer architectures like Linformer and Performer with linear complexity have emerged as promising solutions, their theoretical understanding remains limited. In this paper, we introduce Sumformer, a novel and simple architecture capable of universally approximating equivariant sequence-to-sequence functions. We use Sumformer to give the first universal approximation results for Linformer and Performer. Moreover, we derive a new proof for Transformers, showing that just one attention layer is sufficient for universal approximation.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Foundations of Artificial Intelligence

[566]

J. Goschenhofer, B. Bischl and Z. Kira.
ConstraintMatch for Semi-constrained Clustering.
IJCNN 2023 - International Joint Conference on Neural Networks. Gold Coast Convention and Exhibition Centre, Queensland, Australia, Jul 18-23, 2023. DOI

Abstract

Constrained clustering allows the training of classi-fication models using pairwise constraints only, which are weak and relatively easy to mine, while still yielding full-supervision-level model performance. While they perform well even in the absence of the true underlying class labels, constrained clustering models still require large amounts of binary constraint annotations for training. In this paper, we propose a semi-supervised context whereby a large amount of unconstrained data is available alongside a smaller set of constraints, and propose ConstraintMatch to leverage such unconstrained data. While a great deal of progress has been made in semi-supervised learning using full labels, there are a number of challenges that prevent a naive application of the resulting methods in the constraint-based label setting. Therefore, we reason about and analyze these challenges, specifically 1) proposing a pseudo-constraining mechanism to overcome the confirmation bias, a major weakness of pseudo-labeling, 2) developing new methods for pseudo-labeling towards the selection of informative unconstrained samples, 3) showing that this also allows the use of pairwise loss functions for the initial and auxiliary losses which facilitates semi-constrained model training. In extensive experiments, we demonstrate the effectiveness of ConstraintMatch over relevant baselines in both the regular clustering and overclustering scenarios on five challenging benchmarks and provide analyses of its several components.

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[565]

C. Kolb, B. Bischl, C. L. Müller and D. Rügamer.
Sparse Modality Regression.
IWSM 2023 - 37th International Workshop on Statistical Modelling. Dortmund, Germany, Jul 17-21, 2023. Best Paper Award. PDF

Abstract

Deep neural networks (DNNs) enable learning from various data modalities, such as images or text. This concept has also found its way into statistical modelling through the use of semi-structured regression, a model additively combining structured predictors with unstructured effects from arbitrary data modalities learned through a DNN. This paper introduces a new framework called sparse modality regression (SMR). SMR is a regression model combining different data modalities and uses a group lasso-type regularization approach to perform modality selection by zeroing out potentially uninformative modalities.

MCML Authors

Chris Kolb

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[564]

A. Giovagnoli, Y. Ma, M. Schubert and V. Tresp.
QNEAT: Natural Evolution of Variational Quantum Circuit Architecture.
GECCO 2023 - Genetic and Evolutionary Computation Conference. Lisbon, Portugal, Jul 15-19, 2023. DOI

Abstract

Quantum Machine Learning (QML) is a recent and rapidly evolving field where the theoretical framework and logic of quantum mechanics is employed to solve machine learning tasks. A variety of techniques that have a different level of quantum-classical hybridization has been presented. Here we focus on variational quantum circuits (VQC), which emerged as the most promising candidates for the quantum counterpart of neural networks in the noisy intermediate-scale quantum (NISQ) era. Although showing promising results, VQCs can be hard to train because of different issues e.g. barren plateau, periodicity of the weights or choice of the architecture. In this paper we focus on this last problem and in order to address it we propose a gradient free algorithm inspired by natural evolution to optimise both the weights and the architecture of the VQC. In particular, we present a version of the well known neuroevolution of augmenting topologies (NEAT) algorithm adapted to the case of quantum variational circuits. We test the algorithm with different benchmark problems of classical fields of machine learning i.e. reinforcement learning and optimization.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Volker Tresp

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[563]

L. Schneider, B. Bischl and J. Thomas.
Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models.
GECCO 2023 - Genetic and Evolutionary Computation Conference. Lisbon, Portugal, Jul 15-19, 2023. DOI

Abstract

We present a model-agnostic framework for jointly optimizing the predictive performance and interpretability of supervised machine learning models for tabular data. Interpretability is quantified via three measures: feature sparsity, interaction sparsity of features, and sparsity of non-monotone feature effects. By treating hyperparameter optimization of a machine learning algorithm as a multi-objective optimization problem, our framework allows for generating diverse models that trade off high performance and ease of interpretability in a single optimization run. Efficient optimization is achieved via augmentation of the search space of the learning algorithm by incorporating feature selection, interaction and monotonicity constraints into the hyperparameter search space. We demonstrate that the optimization problem effectively translates to finding the Pareto optimal set of groups of selected features that are allowed to interact in a model, along with finding their optimal monotonicity constraints and optimal hyperparameters of the learning algorithm itself. We then introduce a novel evolutionary algorithm that can operate efficiently on this augmented search space. In benchmark experiments, we show that our framework is capable of finding diverse models that are highly competitive or outperform state-of-the-art XGBoost or Explainable Boosting Machine models, both with respect to performance and interpretability.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Janek Thomas

Dr.

* Former Member

[562]

M. Wever, M. Özdogan and E. Hüllermeier.
Cooperative Co-Evolution for Ensembles of Nested Dichotomies for Multi-Class Classification.
GECCO 2023 - Genetic and Evolutionary Computation Conference. Lisbon, Portugal, Jul 15-19, 2023. DOI

Abstract

In multi-class classification, it can be beneficial to decompose a learning problem into several simpler problems. One such reduction technique is the use of so-called nested dichotomies, which recursively bisect the set of possible classes such that the resulting subsets can be arranged in the form of a binary tree, where each split defines a binary classification problem. Recently, a genetic algorithm for optimizing the structure of such nested dichotomies has achieved state-of-the-art results. Motivated by its success, we propose to extend this approach using a co-evolutionary scheme to optimize both the structure of nested dichotomies and their composition into ensembles through which they are evaluated. Furthermore, we present an experimental study showing this approach to yield ensembles of nested dichotomies at substantially lower cost and, in some cases, even with an improved generalization performance.

MCML Authors

Marcel Wever

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[561]

F. Draxler, A. Schmidt and L. L. Chuang.
Relevance, Effort, and Perceived Quality: Language Learners’ Experiences with AI-Generated Contextually Personalized Learning Material.
DIS 2023 - ACM Conference on Designing Interactive Systems. Pittsburgh, PA, USA, Jul 10-14, 2023. DOI

Abstract

Artificial intelligence has enabled scalable auto-creation of context-aware personalized learning materials. However, it remains unclear how content personalization shapes the learners’ experience. We developed one personalized and two non-personalized, crowdsourced versions of a mobile language learning app: (1) with personalized auto-generated photo flashcards, (2) the same flashcards provided through crowdsourcing, and (3) manually generated flashcards based on the same photos. A two-week in-situ study (n = 64) showed that learners assessed the quality of the non-personalized auto-generated material to be on par with manually generated material, which means that auto-generation is viable. However, when the auto-generation was personalized, the learners’ quality rating was significantly lower. Further analyses suggest that aspects such as prior expectations and required efforts must be addressed before learners can actually benefit from context-aware personalization with auto-generated material. We discuss design implications and provide an outlook on the role of content personalization in AI-supported learning.

MCML Authors

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[560]

T. Fuchs, F. Krahmer and R. Kueng.
Greedy-type sparse recovery from heavy-tailed measurements.
SampTA 2023 - 14th International Conference on Sampling Theory and Applications. Yale, CT, USA, Jul 10-14, 2023. DOI

Abstract

Recovering a s-sparse signal vector x∈Cn from a comparably small number of measurements y:=(Ax)∈Cm is the underlying challenge of compressed sensing. By now, a variety of efficient greedy algorithms has been established and strong recovery guarantees have been proven for random measurement matrices A∈Cm×n.However, they require a strong concentration of A ∗ Ax around its mean x (in particular, the Restricted Isometry Property), which is generally not fulfilled for heavy-tailed matrices. In order to overcome this issue and even cover applications where only limited knowledge about the distribution of the measurements matrix is known, we suggest substituting A ∗ Ax by a median-of-means estimator.In the following, we present an adapted greedy algorithm, based on median-of-means, and prove that it can recover any s-sparse unit vector x∈Cn up to a l 2 -error ∥x−x^∥2<∈ with high probability, while only requiring a bound on the fourth moment of the entries of A. The sample complexity is of the order O(slog(nlog(1∈))log(1∈)).

MCML Authors

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[559]

F. Hoppe, F. Krahmer, C. M. Verdun, M. I. Menzel and H. Rauhut.
Sampling Strategies for Compressive Imaging Under Statistical Noise.
SampTA 2023 - 14th International Conference on Sampling Theory and Applications. Yale, CT, USA, Jul 10-14, 2023. DOI

Abstract

Most of the compressive sensing literature in signal processing assumes that the noise present in the measurement has an adversarial nature, i.e., it is bounded in a certain norm. At the same time, the randomization introduced in the sampling scheme usually assumes an i.i.d. model where rows are sampled with replacement. In this case, if a sample is measured a second time, it does not add additional information. For many applications, where the statistical noise model is a more accurate one, this is not true anymore since a second noisy sample comes with an independent realization of the noise, so there is a fundamental difference between sampling with and without replacement. Therefore, a more careful analysis must be performed. In this short note, we illustrate how one can mathematically transition between these two noise models. This transition gives rise to a weighted LASSO reconstruction method for sampling without replacement, which numerically improves the solution of high-dimensional compressive imaging problems.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Optimization & Data Analysis

Claudio Mayrink Verdun

Dr.

* Former Member

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[558]

R. Joy, F. Krahmer, A. Lupoli and R. Ramakrishan.
Quantization of Bandlimited Functions Using Random Samples.
SampTA 2023 - 14th International Conference on Sampling Theory and Applications. Yale, CT, USA, Jul 10-14, 2023. DOI

Abstract

We investigate the compatibility of distributed noise-shaping quantization with random samples of bandlimited functions. Let f be a real-valued π-bandlimited function. Suppose R > 1 is a real number, and assume that {xi}mi=1 is a sequence of i.i.d random variables uniformly distributed on [−R~,R~], where R~>R is appropriately chosen. We show that on using a distributed noise-shaping quantizer to quantize the values of f at {xi}mi=1, a function f ♯ can be reconstructed from these quantized values such that ∥∥f−f♯∥∥L2[−R,R] decays with high probability as m and R~ increase.

MCML Authors

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[557]

F. Krahmer, H. Lyu, R. Saab, A. Veselovska and R. Wang.
Quantization of Bandlimited Graph Signals.
SampTA 2023 - 14th International Conference on Sampling Theory and Applications. Yale, CT, USA, Jul 10-14, 2023. DOI

Abstract

Graph models and graph-based signals are becoming increasingly important in machine learning, natural sciences, and modern signal processing. In this paper, we address the problem of quantizing bandlimited graph signals. We introduce two classes of noise-shaping algorithms for graph signals that differ in their sampling methodologies. We demonstrate that these algorithms can be efficiently used to construct quantized representatives of bandlimited graph-based signals with bounded amplitude. Moreover, for one of the algorithms, we provide theoretical guarantees on the relative error between the quantized representative and the true signal.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

Hanna Veselovska

Dr.

Applied Numerical Analysis

[556]

F. Krahmer and A. Veselovska.
Digital Halftoning via Mixed-Order Weighted Σ∆ Modulation.
SampTA 2023 - 14th International Conference on Sampling Theory and Applications. Yale, CT, USA, Jul 10-14, 2023. DOI

Abstract

In this paper, we propose 1-bit weighted Σ∆ quantization schemes of mixed order as a technique for digital halftoning. These schemes combine weighted Σ∆ schemes of different orders for two-dimensional signals so one can profit both from the better stability properties of low order schemes and the better accuracy properties of higher order schemes. We demonstrate that the resulting mixed-order Σ∆ schemes in combination with a padding strategy yield improved representation quality in digital halftoning as measured in the Feature Similarity Index.These empirical results are complemented by mathematical error bounds for the model of two-dimensional bandlimited signals as motivated by a mathematical model of human visual perception.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

Hanna Veselovska

Dr.

Applied Numerical Analysis

[555]

A. Imani, P. Lin, A. H. Kargaran, S. Severini, M. J. Sabet, N. Kassner, C. Ma, H. Schmid, A. Martins, F. Yvon and H. Schütze.
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages.
ACL 2023 - 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI GitHub

Abstract

The NLP community has mainly focused on scaling Large Language Models (LLMs) vertically, i.e., making them better for about 100 languages. We instead scale LLMs horizontally: we create, through continued pretraining, Glot500-m, an LLM that covers 511 predominantly low-resource languages. An important part of this effort is to collect and clean Glot500-c, a corpus that covers these 511 languages and allows us to train Glot500-m. We evaluate Glot500-m on five diverse tasks across these languages. We observe large improvements for both high-resource and low-resource languages compared to an XLM-R baseline. Our analysis shows that no single factor explains the quality of multilingual LLM representations. Rather, a combination of factors determines quality including corpus size, script, ‘help’ from related languages and the total capacity of the model. Our work addresses an important goal of NLP research: we should notlimit NLP to a small fraction of the world’s languages and instead strive to support as many languages as possible to bring the benefits of NLP technology to all languages and cultures.

MCML Authors

Ayyoob Imani

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Peiqin Lin

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Amir Hossein Kargaran

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Masoud Jalili Sabet

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Nora Kassner

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Chunlan Ma

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[554]

Y. Liu, H. Ye, L. Weissweiler, P. Wicke, R. Pei, R. Zangenfeind and H. Schütze.
A Crosslingual Investigation of Conceptualization in 1335 Languages.
ACL 2023 - 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Languages differ in how they divide up the world into concepts and words; e.g., in contrast to English, Swahili has a single concept for ‘belly’ and ‘womb’. We investigate these differences in conceptualization across 1,335 languages by aligning concepts in a parallel corpus. To this end, we propose Conceptualizer, a method that creates a bipartite directed alignment graph between source language concepts and sets of target language strings. In a detailed linguistic analysis across all languages for one concept (‘bird’) and an evaluation on gold standard data for 32 Swadesh concepts, we show that Conceptualizer has good alignment accuracy. We demonstrate the potential of research on conceptualization in NLP with two experiments. (1) We define crosslingual stability of a concept as the degree to which it has 1-1 correspondences across languages, and show that concreteness predicts stability. (2) We represent each language by its conceptualization pattern for 83 concepts, and define a similarity measure on these representations. The resulting measure for the conceptual similarity between two languages is complementary to standard genealogical, typological, and surface similarity measures. For four out of six language families, we can assign languages to their correct family based on conceptual similarity with accuracies between 54% and 87%.

MCML Authors

Yihong Liu

Computational Linguistics

Haotian Ye

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Leonie Weissweiler

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Philipp Wicke

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[553]

Y. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
PVGRU: Generating Diverse and Relevant Dialogue Responses via Pseudo-Variational Mechanism.
ACL 2023 - 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

We investigate response generation for multi-turn dialogue in generative chatbots. Existing generative modelsbased on RNNs (Recurrent Neural Networks) usually employ the last hidden state to summarize the history, which makesmodels unable to capture the subtle variability observed in different dialogues and cannot distinguish the differencesbetween dialogues that are similar in composition. In this paper, we propose Pseudo-Variational Gated Recurrent Unit (PVGRU). The key novelty of PVGRU is a recurrent summarizing variable thataggregates the accumulated distribution variations of subsequences. We train PVGRU without relying on posterior knowledge, thus avoiding the training-inference inconsistency problem. PVGRU can perceive subtle semantic variability through summarizing variables that are optimized by two objectives we employ for training: distribution consistency and reconstruction. In addition, we build a Pseudo-Variational Hierarchical Dialogue(PVHD) model based on PVGRU. Experimental results demonstrate that PVGRU can broadly improve the diversity andrelevance of responses on two benchmark datasets.

MCML Authors

Yongkang Liu

* Former Member

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[552]

A. Modarressi, M. Fayyaz, E. Aghazadeh, Y. Yaghoobzadeh and M. T. Pilehvar.
DecompX: Explaining Transformers Decisions by Propagating Token Decomposition.
ACL 2023 - 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI GitHub

Abstract

An emerging solution for explaining Transformer-based models is to use vector-based analysis on how the representations are formed. However, providing a faithful vector-based explanation for a multi-layer model could be challenging in three aspects: (1) Incorporating all components into the analysis, (2) Aggregating the layer dynamics to determine the information flow and mixture throughout the entire model, and (3) Identifying the connection between the vector-based analysis and the model’s predictions. In this paper, we present DecompX to tackle these challenges. DecompX is based on the construction of decomposed token representations and their successive propagation throughout the model without mixing them in between layers. Additionally, our proposal provides multiple advantages over existing solutions for its inclusion of all encoder components (especially nonlinear feed-forward networks) and the classification head. The former allows acquiring precise vectors while the latter transforms the decomposition into meaningful prediction-based values, eliminating the need for norm- or summation-based vector aggregation. According to the standard faithfulness evaluations, DecompX consistently outperforms existing gradient-based and vector-based approaches on various datasets.

MCML Authors

Ali Modarressi

Computational Linguistics

[551]

M. Fromm, M. Berrendorf, E. Faerman and T. Seidl.
Cross-Domain Argument Quality Estimation.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI GitHub

Abstract

Argumentation is one of society’s foundational pillars, and, sparked by advances in NLP, and the vast availability of text data, automated mining of arguments receives increasing attention. A decisive property of arguments is their strength or quality. While there are works on the automated estimation of argument strength, their scope is narrow:They focus on isolated datasets and neglect the interactions with related argument-mining tasks, such as argument identification and evidence detection. In this work, we close this gap by approaching argument quality estimation from multiple different angles:Grounded on rich results from thorough empirical evaluations, we assess the generalization capabilities of argument quality estimation across diverse domains and the interplay with related argument mining tasks. We find that generalization depends on a sufficient representation of different domains in the training part. In zero-shot transfer and multi-task experiments, we reveal that argument quality is among the more challenging tasks but can improve others.

MCML Authors

Michael Fromm

Dr.

* Former Member

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Database Systems and Data Mining

[550]

K. Hämmerl, B. Deiseroth, P. Schramowski, J. Libovický, C. Rothkopf, A. Fraser and K. Kersting.
Speaking Multiple Languages Affects the Moral Bias of Language Models.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Pre-trained multilingual language models (PMLMs) are commonly used when dealing with data from multiple languages and cross-lingual transfer. However, PMLMs are trained on varying amounts of data for each language. In practice this means their performance is often much better on English than many other languages. We explore to what extent this also applies to moral norms. Do the models capture moral norms from English and impose them on other languages? Do the models exhibit random and thus potentially harmful beliefs in certain languages? Both these issues could negatively impact cross-lingual transfer and potentially lead to harmful outcomes. In this paper, we (1) apply the MORALDIRECTION framework to multilingual models, comparing results in German, Czech, Arabic, Chinese, and English, (2) analyse model behaviour on filtered parallel subtitles corpora, and (3) apply the models to a Moral Foundations Questionnaire, comparing with human responses from different countries. Our experiments demonstrate that, indeed, PMLMs encode differing moral biases, but these do not necessarily correspond to cultural differences or commonalities in human opinions. We release our code and models.

MCML Authors

Katharina Hämmerl

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

[549]

K. Hämmerl, A. Fastowski, J. Libovický and A. Fraser.
Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Previous work has shown that the representations output by contextual language models are more anisotropic than static type embeddings, and typically display outlier dimensions. This seems to be true for both monolingual and multilingual models, although much less work has been done on the multilingual context. Why these outliers occur and how they affect the representations is still an active area of research. We investigate outlier dimensions and their relationship to anisotropy in multiple pre-trained multilingual language models. We focus on cross-lingual semantic similarity tasks, as these are natural tasks for evaluating multilingual representations. Specifically, we examine sentence representations. Sentence transformers which are fine-tuned on parallel resources (that are not always available) perform better on this task, and we show that their representations are more isotropic. However, we aim to improve multilingual representations in general. We investigate how much of the performance difference can be made up by only transforming the embedding space without fine-tuning, and visualise the resulting spaces. We test different operations: Removing individual outlier dimensions, cluster-based isotropy enhancement, and ZCA whitening. We publish our code for reproducibility.

MCML Authors

Katharina Hämmerl

Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[548]

Z. Han, R. Liao, J. Gu, Y. Zhang, Z. Ding, Y. Gu, H. Köppl, H. Schütze and V. Tresp.
ECOLA: Enhancing Temporal Knowledge Embeddings with Contextualized Language Representations.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Since conventional knowledge embedding models cannot take full advantage of the abundant textual information, there have been extensive research efforts in enhancing knowledge embedding using texts. However, existing enhancement approaches cannot apply to temporal knowledge graphs (tKGs), which contain time-dependent event knowledge with complex temporal dynamics. Specifically, existing enhancement approaches often assume knowledge embedding is time-independent. In contrast, the entity embedding in tKG models usually evolves, which poses the challenge of aligning temporally relevant texts with entities. To this end, we propose to study enhancing temporal knowledge embedding with textual data in this paper. As an approach to this task, we propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which takes the temporal aspect into account and injects textual information into temporal knowledge embedding. To evaluate ECOLA, we introduce three new datasets for training and evaluating ECOLA. Extensive experiments show that ECOLA significantly enhances temporal KG embedding models with up to 287% relative improvements regarding Hits@1 on the link prediction task.

MCML Authors

Ruotong Liao

Database Systems and Data Mining

Yao Zhang

Database Systems and Data Mining

Zifeng Ding

Database Systems and Data Mining

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Volker Tresp

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining

[547]

E. Nie, S. Liang, H. Schmid and H. Schütze.
Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Multilingual Pretrained Language Models (MPLMs) perform strongly in cross-lingual transfer. We propose Prompts Augmented by Retrieval Crosslingually (PARC) to improve zero-shot performance on low-resource languages (LRLs) by augmenting the context with prompts consisting of semantically similar sentences retrieved from a high-resource language (HRL). PARC improves zero-shot performance on three downstream tasks (sentiment classification, topic categorization, natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in unlabeled (+5.1%) and labeled settings (+16.3%). PARC also outperforms finetuning by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.

MCML Authors

Ercong Nie

Computational Linguistics

Sheng Liang

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[546]

D. Saggau, M. Rezaei, B. Bischl and I. Chalkidis.
Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines. Despite recent advances in the development of transformer-based models that produce sentence embeddings with self-contrastive learning, the encoding of long documents (Ks of words) is still challenging with respect to both efficiency and quality considerations. Therefore, we train Longfomer-based document encoders using a state-of-the-art unsupervised contrastive learning method (SimCSE). Further on, we complement the baseline method -siamese neural network- with additional convex neural networks based on functional Bregman divergence aiming to enhance the quality of the output document representations. We show that overall the combination of a self-contrastive siamese network and our proposed neural Bregman network outperforms the baselines in two linear classification settings on three long document topic classification tasks from the legal and biomedical domains.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[545]

L. Weber and B. Plank.
ActiveAED: A Human in the Loop Improves Annotation Error Detection.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Manually annotated datasets are crucial for training and evaluating Natural Language Processing models. However, recent work has discovered that even widely-used benchmark datasets contain a substantial number of erroneous annotations. This problem has been addressed with Annotation Error Detection (AED) models, which can flag such errors for human re-annotation. However, even though many of these AED methods assume a final curation step in which a human annotator decides whether the annotation is erroneous, they have been developed as static models without any human-in-the-loop component. In this work, we propose ActiveAED, an AED method that can detect errors more accurately by repeatedly querying a human for error corrections in its prediction loop. We evaluate ActiveAED on eight datasets spanning five different tasks and find that it leads to improvements over the state of the art on seven of them, with gains of up to six percentage points in average precision.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[544]

P. Wicke.
LMs stand their Ground: Investigating the Effect of Embodiment in Figurative Language Interpretation by Language Models.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Figurative language is a challenge for language models since its interpretation is based on the use of words in a way that deviates from their conventional order and meaning. Yet, humans can easily understand and interpret metaphors, similes or idioms as they can be derived from embodied metaphors. Language is a proxy for embodiment and if a metaphor is conventional and lexicalised, it becomes easier for a system without a body to make sense of embodied concepts. Yet, the intricate relation between embodiment and features such as concreteness or age of acquisition has not been studied in the context of figurative language interpretation concerning language models. Hence, the presented study shows how larger language models perform better at interpreting metaphoric sentences when the action of the metaphorical sentence is more embodied. The analysis rules out multicollinearity with other features (e.g. word length or concreteness) and provides initial evidence that larger language models conceptualise embodied concepts to a degree that facilitates figurative language understanding.

MCML Authors

Philipp Wicke

Dr.

Computational Linguistics

[543]

Y. Liu, A. Chronopoulou, H. Schütze and A. Fraser.
On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss.
IWSLT 2023 - 20th International Conference on Spoken Language Translation. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Although unsupervised neural machine translation (UNMT) has achieved success in many language pairs, the copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs, especially when low-resource languages are involved. We find this issue is closely related to an unexpected copying behavior during online back-translation (BT). In this work, we propose a simple but effective training schedule that incorporates a language discriminator loss. The loss imposes constraints on the intermediate translation so that the translation is in the desired language. By conducting extensive experiments on different language pairs, including similar and distant, high and low-resource languages, we find that our method alleviates the copying problem, thus improving the translation performance on low-resource languages.

MCML Authors

Yihong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Alexandra Chronopoulou

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Data Analytics & Statistics

[542]

P. Wicke, L. K. Senel, S. Zhang, L. Figueredo, A. Naceri, S. Haddadin and H. Schütze.
Towards Language-Based Modulation of Assistive Robots through Multimodal Models.
Geriatronics Summit 2023 - 2nd Geriatronics Summit. Garmisch-Partenkirchen, Germany, Jul 02-03, 2023. arXiv

Abstract

In the field of Geriatronics, enabling effective and transparent communication between humans and robots is crucial for enhancing the acceptance and performance of assistive robots. Our early-stage research project investigates the potential of language-based modulation as a means to improve human-robot interaction. We propose to explore real-time modulation during task execution, leveraging language cues, visual references, and multimodal inputs. By developing transparent and interpretable methods, we aim to enable robots to adapt and respond to language commands, enhancing their usability and flexibility. Through the exchange of insights and knowledge at the workshop, we seek to gather valuable feedback to advance our research and contribute to the development of interactive robotic systems for Geriatronics and beyond.

MCML Authors

Philipp Wicke

Dr.

Computational Linguistics

Shengqiang Zhang

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[541]

G. Kutyniok.
An introduction to the mathematics of deep learning.
European Congress of Mathematics (Jul. 2023). DOI

Abstract

Despite the outstanding success of deep neural networks in real-world applications, ranging from science to public life, most of the related research is empirically driven and a comprehensive mathematical foundation is still missing. At the same time, these methods have already shown their impressive potential in mathematical research areas such as imaging sciences, inverse problems, or numerical analysis of partial differential equations, sometimes by far outperforming classical mathematical approaches for particular problem classes. The goal of this paper, which is based on a plenary lecture at the 8th European Congress of Mathematics in 2021, is to first provide an introduction into this new vibrant research area. We will then showcase some recent advances in two directions, namely the development of a mathematical foundation of deep learning and the introduction of novel deep learning-based approaches to solve inverse problems and partial differential equations.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[540]

B. X. W. Liew, D. Rügamer, Q. Mei, Z. Altai, X. Zhu, X. Zhai and N. Cortes.
Smooth and accurate predictions of joint contact force timeseries in gait using overparameterised deep neural networks.
Frontiers in Bioengineering and Biotechnology 11 (Jul. 2023). DOI

Abstract

Alterations in joint contact forces (JCFs) are thought to be important mechanisms for the onset and progression of many musculoskeletal and orthopaedic pain disorders. Computational approaches to JCFs assessment represent the only non-invasive means of estimating in-vivo forces; but this cannot be undertaken in free-living environments. Here, we used deep neural networks to train models to predict JCFs, using only joint angles as predictors. Our neural network models were generally able to predict JCFs with errors within published minimal detectable change values. The errors ranged from the lowest value of 0.03 bodyweight (BW) (ankle medial-lateral JCF in walking) to a maximum of 0.65BW (knee VT JCF in running). Interestingly, we also found that over parametrised neural networks by training on longer epochs (>100) resulted in better and smoother waveform predictions. Our methods for predicting JCFs using only joint kinematics hold a lot of promise in allowing clinicians and coaches to continuously monitor tissue loading in free-living environments.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Statistics, Data Science and Machine Learning

[539]

C. Fritz, G. De Nicola, S. Kevork, D. Harhoff and G. Kauermann.
Modelling the large and dynamically growing bipartite network of German patents and inventors.
Journal of the Royal Statistical Society. Series A (Statistics in Society) 186.3 (Jul. 2023). DOI

Abstract

To explore the driving forces behind innovation, we analyse the dynamic bipartite network of all inventors and patents registered within the field of electrical engineering in Germany in the past two decades. To deal with the sheer size of the data, we decompose the network by exploiting the fact that most inventors tend to only stay active for a relatively short period. We thus propose a Temporal Exponential Random Graph Model with time-varying actor set and sufficient statistics mirroring substantial expectations for our analysis. Our results corroborate that inventor characteristics and team formation are essential to the dynamics of invention.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[538]

F. Krahmer and A. Veselovska.
Enhanced Digital Halftoning via Weighted Sigma-Delta Modulation.
SIAM Journal on Imaging Sciences 16.3 (Jul. 2023). DOI

Abstract

In this paper, we study error diffusion techniques for digital halftoning from the perspective of 1-bit quantization. We introduce a method to generate schemes for two-dimensional signals as a weighted combination of their one-dimensional counterparts and show that various error diffusion schemes proposed in the literature can be represented in this framework via schemes of first order. Under the model of two-dimensional bandlimited signals, which is motivated by a mathematical model of human visual perception, we derive quantitative error bounds for such weighted schemes. We see these bounds as a step towards a mathematical understanding of the good empirical performance of error diffusion, even though they are formulated in the supremum norm, which is known to not fully capture the visual similarity of images. Motivated by the correspondence between existing error diffusion algorithms and first-order schemes, we study the performance of the analogous weighted combinations of second-order schemes and show that they exhibit a superior performance in terms of guaranteed error decay for two-dimensional bandlimited signals. In extensive numerical simulations for real-world images, we demonstrate that with some modifications to enhance stability this superior performance also translates to the problem of digital halftoning. More concretely, we find that certain second-order weighted schemes exhibit competitive performance for digital halftoning of real-world images in terms of the Feature Similarity Index (FSIM), a state-of-the-art measure for image quality assessment.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

Hanna Veselovska

Dr.

Applied Numerical Analysis

[537]

C. Reinkemeyer, Y. Khazaei, M. Weigert, M. Hannes, R. Le Gleut, M. Plank, S. Winter, I. Norena, T. Meier, L. Xu, R. Rubio-Acero, S. Wiegrebe, T. G. Le Thi, C. Fuchs, K. Radon, I. Paunovic, C. Janke, A. Wieser, H. Küchenhoff, M. Hoelscher, N. Castelletti and K. I. O. W. G. KoCo Impf ORCHESTRA Working Grp.
The Prospective COVID-19 Post-Immunization Serological Cohort in Munich (KoCo-Impf): Risk Factors and Determinants of Immune Response in Healthcare Workers.
Viruses 15.7 (Jul. 2023). DOI

Abstract

Antibody studies analyze immune responses to SARS-CoV-2 vaccination and infection, which is crucial for selecting vaccination strategies. In the KoCo-Impf study, conducted between 16 June and 16 December 2021, 6088 participants aged 18 and above from Munich were recruited to monitor antibodies, particularly in healthcare workers (HCWs) at higher risk of infection. Roche Elecsys® Anti-SARS-CoV-2 assays on dried blood spots were used to detect prior infections (anti-Nucleocapsid antibodies) and to indicate combinations of vaccinations/infections (anti-Spike antibodies). The anti-Spike seroprevalence was 94.7%, whereas, for anti-Nucleocapsid, it was only 6.9%. HCW status and contact with SARS-CoV-2-positive individuals were identified as infection risk factors, while vaccination and current smoking were associated with reduced risk. Older age correlated with higher anti-Nucleocapsid antibody levels, while vaccination and current smoking decreased the response. Vaccination alone or combined with infection led to higher anti-Spike antibody levels. Increasing time since the second vaccination, advancing age, and current smoking reduced the anti-Spike response. The cumulative number of cases in Munich affected the anti-Spike response over time but had no impact on anti-Nucleocapsid antibody development/seropositivity. Due to the significantly higher infection risk faced by HCWs and the limited number of significant risk factors, it is suggested that all HCWs require protection regardless of individual traits.

MCML Authors

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[536]

I. van Mechelen, A.-L. Boulesteix, R. Dangl, N. Dean, C. Hennig, F. Leisch, D. Steinley and M. J. Warrens.
A white paper on good research practices in benchmarking: The case of cluster analysis.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13.6 (Jul. 2023). DOI

Abstract

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance, requiring that proposals of new methods are extensively and carefully compared with their best predecessors, and existing methods subjected to neutral comparison studies. Answers to benchmarking questions should be evidence-based, with the relevant evidence being collected through well-thought-out procedures, in reproducible and replicable ways. In the present paper, we review good research practices in benchmarking from the perspective of the area of cluster analysis. Discussion is given to the theoretical, conceptual underpinnings of benchmarking based on simulated and empirical data in this context. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made based on existing literature.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[535]

M. Aßenmacher, N. Sauter and C. Heumann.
Classifying multilingual party manifestos: Domain transfer across country, time, and genre.
Preprint (Jul. 2023). arXiv

Abstract

Annotating costs of large corpora are still one of the main bottlenecks in empirical social science research. On the one hand, making use of the capabilities of domain transfer allows re-using annotated data sets and trained models. On the other hand, it is not clear how well domain transfer works and how reliable the results are for transfer across different dimensions. We explore the potential of domain transfer across geographical locations, languages, time, and genre in a large-scale database of political manifestos. First, we show the strong within-domain classification performance of fine-tuned transformer models. Second, we vary the genre of the test set across the aforementioned dimensions to test for the fine-tuned models’ robustness and transferability. For switching genres, we use an external corpus of transcribed speeches from New Zealand politicians while for the other three dimensions, custom splits of the Manifesto database are used. While BERT achieves the best scores in the initial experiments across modalities, DistilBERT proves to be competitive at a lower computational expense and is thus used for further experiments across time and country. The results of the additional analysis show that (Distil)BERT can be applied to future data with similar performance. Moreover, we observe (partly) notable differences between the political manifestos of different countries of origin, even if these countries share a language or a cultural background.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[534]

J. Baan, N. Daheim, E. Ilia, D. Ulmer, H.-S. Li, R. Fernández, B. Plank, R. Sennrich, C. Zerva and W. Aziz.
Uncertainty in Natural Language Generation: From Theory to Applications.
Preprint (Jul. 2023). arXiv

Abstract

Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely to be wrong; and supporting multiple views, backgrounds and writing styles – reflecting diverse human sub-populations. In this paper, we argue that a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals. We first present the fundamental theory, frameworks and vocabulary required to represent uncertainty. We then characterise the main sources of uncertainty in NLG from a linguistic perspective, and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning and more.

MCML Authors

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[533]

A. Bacho, H. Boche and G. Kutyniok.
Reliable AI: Does the Next Generation Require Quantum Computing?
Preprint (Jul. 2023). arXiv

Abstract

In this survey, we aim to explore the fundamental question of whether the next generation of artificial intelligence requires quantum computing. Artificial intelligence is increasingly playing a crucial role in many aspects of our daily lives and is central to the fourth industrial revolution. It is therefore imperative that artificial intelligence is reliable and trustworthy. However, there are still many issues with reliability of artificial intelligence, such as privacy, responsibility, safety, and security, in areas such as autonomous driving, healthcare, robotics, and others. These problems can have various causes, including insufficient data, biases, and robustness problems, as well as fundamental issues such as computability problems on digital hardware. The cause of these computability problems is rooted in the fact that digital hardware is based on the computing model of the Turing machine, which is inherently discrete. Notably, our findings demonstrate that digital hardware is inherently constrained in solving problems about optimization, deep learning, or differential equations. Therefore, these limitations carry substantial implications for the field of artificial intelligence, in particular for machine learning. Furthermore, although it is well known that the quantum computer shows a quantum advantage for certain classes of problems, our findings establish that some of these limitations persist when employing quantum computing models based on the quantum circuit or the quantum Turing machine paradigm. In contrast, analog computing models, such as the Blum-Shub-Smale machine, exhibit the potential to surmount these limitations.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[532]

J. Gu, Z. Han, S. Chen, A. Beirami, B. He, G. Zhang, R. Liao, Y. Qin, V. Tresp and P. Torr.
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
Preprint (Jul. 2023). arXiv

Abstract

Prompt engineering is a technique that involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be created manually as natural language instructions or generated automatically as either natural language instructions or vector representations. Prompt engineering enables the ability to perform predictions based solely on prompts without updating model parameters, and the easier application of large pre-trained models in real-world tasks. In past years, Prompt engineering has been well-studied in natural language processing. Recently, it has also been intensively studied in vision-language modeling. However, there is currently a lack of a systematic overview of prompt engineering on pre-trained vision-language models. This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e.g. Flamingo), image-text matching models (e.g. CLIP), and text-to-image generation models (e.g. Stable Diffusion). For each type of model, a brief model summary, prompting methods, prompting-based applications, and the corresponding responsibility and integrity issues are summarized and discussed. Furthermore, the commonalities and differences between prompting on vision-language models, language models, and vision models are also discussed. The challenges, future directions, and research opportunities are summarized to foster future research on this topic.

MCML Authors

Shuo Chen

Database Systems and Data Mining

Gengyuan Zhang

Database Systems and Data Mining

Ruotong Liao

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[531]

C. Kolb, C. L. Müller, B. Bischl and D. Rügamer.
Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization.
Preprint (Jul. 2023). arXiv

Abstract

We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, our method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning. The proposed optimization transfer comprises an overparameterization of selected parameters and a change of penalties. In the overparametrized problem, smooth surrogate regularization induces non-smooth, sparse regularization in the base parametrization. We prove that the surrogate objective is equivalent in the sense that it not only has identical global minima but also matching local minima, thereby avoiding the introduction of spurious solutions. Additionally, our theory establishes results of independent interest regarding matching local minima for arbitrary, potentially unregularized, objectives. We comprehensively review sparsity-inducing parametrizations across different fields that are covered by our general theory, extend their scope, and propose improvements in several aspects. Numerical experiments further demonstrate the correctness and effectiveness of our approach on several sparse learning problems ranging from high-dimensional regression to sparse neural network training.

MCML Authors

Chris Kolb

Statistical Learning and Data Science

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[530]

A. Farshad.
Representation learning for semantic scene understanding.
HHAI 2023 - 2nd International Conference on Hybrid Human-Artificial Intelligence. Munich, Germany, Jun 26-30, 2023. DOI

Abstract

Recent advances in semantic scene understanding have underscored its growing significance in the field of computer vision. Enhanced representations can be achieved by incorporating semantic information derived from textual data and applying it to generative models for scene modeling. Nevertheless, the features extracted from text prompts may not seamlessly model a scene. Scene graphs offer a robust solution to address this challenge, serving as a powerful representation for semantic image generation and manipulation. In this study, we delve into the utilization of scene graphs for this purpose and propose novel methodologies to augment both the representation and learning processes involved in image generation and manipulation. For image generation, we examine meta-learning for producing images in unprecedented scenes and refine the generated images using an autoregressive scene graph generation model. In terms of image manipulation, we put forth a novel self-supervised method that eliminates the need for paired before-and-after data. Additionally, we boost image manipulation performance by disentangling latent and graph representations in a self-supervised manner. By evaluating the efficacy of our proposed approaches on a diverse range of publicly available benchmarks, we demonstrate their superiority, ultimately achieving state-of-the-art performance in the domain of semantic image generation and manipulation.

MCML Authors

Azade Farshad

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Computer Aided Medical Procedures & Augmented Reality

[529]

C. M. M. Frey.
Learning from complex networks.
Dissertation 2023. DOI

Abstract

This thesis addresses key challenges in modern graph-based applications by proposing advanced techniques in spectral clustering, graph neural networks, and probabilistic graph structures. It introduces a robust, accelerated spectral clustering model for homogeneous graphs and a transformer-inspired Graph Shell Attention model to counter over-smoothing in graph neural networks. Furthermore, it tackles optimization in uncertain networks, presents a new approach to a vehicle routing problem with flexible delivery locations, and provides a novel method for classifying social media trends, illustrating the vital role of AI in understanding complex graph structures. (Shortened).

MCML Authors

Christian Frey

Dr.

* Former Member

[528]

M. Eisenberger, A. Toker, L. Leal-Taixé and D. Cremers.
G-MSM: Unsupervised Multi-Shape Matching with Graph-based Affinity Priors.
CVPR 2023 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada, Jun 18-23, 2023. DOI GitHub

Abstract

We present G-MSM (Graph-based Multi-Shape Matching), a novel unsupervised learning approach for non-rigid shape correspondence. Rather than treating a collection of input poses as an unordered set of samples, we explicitly model the underlying shape data manifold. To this end, we propose an adaptive multi-shape matching architecture that constructs an affinity graph on a given set of training shapes in a self-supervised manner. The key idea is to combine putative, pairwise correspondences by propagating maps along shortest paths in the underlying shape graph. During training, we enforce cycle-consistency between such optimal paths and the pairwise matches which enables our model to learn topology-aware shape priors. We explore different classes of shape graphs and recover specific settings, like template-based matching (star graph) or learnable ranking/sorting (TSP graph), as special cases in our framework. Finally, we demonstrate state-of-the-art performance on several recent shape correspondence benchmarks, including realworld 3D scan meshes with topological noise and challenging inter-class pairs.

MCML Authors

Laura Leal-Taixé

Prof. Dr.

* Former Principal Investigator

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[527]

L. Härenstam-Nielsen, N. Zeller and D. Cremers.
Semidefinite Relaxations for Robust Multiview Triangulation.
CVPR 2023 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada, Jun 18-23, 2023. DOI

Abstract

We propose an approach based on convex relaxations for certifiably optimal robust multiview triangulation. To this end, we extend existing relaxation approaches to non-robust multiview triangulation by incorporating a least squares cost function. We propose two formulations, one based on epipolar constraints and one based on fractional reprojection constraints. The first is lower dimensional and remains tight under moderate noise and outlier levels, while the second is higher dimensional and therefore slower but remains tight even under extreme noise and outlier levels. We demonstrate through extensive experiments that the proposed approaches allow us to compute provably optimal re-constructions even under significant noise and a large percentage of outliers.

MCML Authors

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[526]

D. Kotovenko, P. Ma, T. Milbich and B. Ommer.
Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning.
CVPR 2023 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada, Jun 18-23, 2023. DOI

Abstract

Learning compact image embeddings that yield seman-tic similarities between images and that generalize to un-seen test classes, is at the core of deep metric learning (DML). Finding a mapping from a rich, localized image feature map onto a compact embedding vector is challenging: Although similarity emerges between tuples of images, DML approaches marginalize out information in an individ-ual image before considering another image to which simi-larity is to be computed. Instead, we propose during training to condition the em-bedding of an image on the image we want to compare it to. Rather than embedding by a simple pooling as in standard DML, we use cross-attention so that one image can iden-tify relevant features in the other image. Consequently, the attention mechanism establishes a hierarchy of conditional embeddings that gradually incorporates information about the tuple to steer the representation of an individual image. The cross-attention layers bridge the gap between the origi-nal unconditional embedding and the final similarity and al-low backpropagtion to update encodings more directly than through a lossy pooling layer. At test time we use the re-sulting improved unconditional embeddings, thus requiring no additional parameters or computational overhead. Ex-periments on established DML benchmarks show that our cross-attention conditional embedding during training im-proves the underlying standard DML pipeline significantly so that it outperforms the state-of-the-art.

MCML Authors

Pingchuan Ma

Computer Vision & Learning

Björn Ommer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Computer Vision & Learning

[525]

Y. Mansour and R. Heckel.
Zero-Shot Noise2Noise: Efficient Image Denoising without any Data.
CVPR 2023 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada, Jun 18-23, 2023. DOI

Abstract

Recently, self-supervised neural networks have shown excellent image denoising performance. How-ever, current dataset free methods are either computationally expensive, require a noise model, or have inad-equate image quality. In this work we show that a simple 2-layer network, without any training data or knowledge of the noise distribution, can enable high-quality image denoising at low computational cost. Our approach is motivated by Noise2Noise and Neighbor2Neighbor and works well for denoising pixel-wise independent noise. Our experiments on artificial, real-world cam-era, and microscope noise show that our method termed ZS-N2N (Zero Shot Noise2Noise) often outperforms ex-isting dataset-free methods at a reduced cost, making it suitable for use cases with scarce data availability and limited compute.

MCML Authors

Youssef Mansour

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

[524]

D. Muhle, L. Koestler, K. M. Jatavallabhula and D. Cremers.
Learning Correspondence Uncertainty via Differentiable Nonlinear Least Squares.
CVPR 2023 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada, Jun 18-23, 2023. DOI

Abstract

We propose a differentiable nonlinear least squares framework to account for uncertainty in relative pose estimation from feature correspondences. Specifically, we introduce a symmetric version of the probabilistic normal epipolar constraint, and an approach to estimate the co-variance of feature positions by differentiating through the camera pose estimation procedure. We evaluate our approach on synthetic, as well as the KITTI and EuRoC real-world datasets. On the synthetic dataset, we confirm that our learned covariances accurately approximate the true noise distribution. In real world experiments, we find that our approach consistently outperforms state-of-the-art non-probabilistic and probabilistic approaches, regardless of the feature extraction algorithm of choice.

MCML Authors

Dominik Muhle

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[523]

J. Seidenschwarz, G. Braso, V. C. Serrano, I. Elezi and L. Leal-Taixé.
Simple Cues Lead to a Strong Multi-Object Tracker.
CVPR 2023 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada, Jun 18-23, 2023. DOI GitHub

Abstract

For a long time, the most common paradigm in Multi-Object Tracking was tracking-by-detection (TbD), where objects are first detected and then associated over video frames. For association, most models resourced to motion and appearance cues, e.g., re-identification networks. Recent approaches based on attention propose to learn the cues in a data-driven manner, showing impressive results. In this paper, we ask ourselves whether simple good old TbD methods are also capable of achieving the performance of end-to-end models. To this end, we propose two key ingredients that allow a standard re-identification network to excel at appearance-based tracking. We extensively analyse its failure cases, and show that a combination of our appearance features with a simple motion model leads to strong tracking results. Our tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and DanceTrack, achieving state-of-the-art performance.

MCML Authors

Jenny Seidenschwarz

Computer Vision & Artificial Intelligence

Laura Leal-Taixé

Prof. Dr.

* Former Principal Investigator

[522]

S. Weber, N. Demmel, T. Chon Chan and D. Cremers.
Power Bundle Adjustment for Large-Scale 3D Reconstruction.
CVPR 2023 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada, Jun 18-23, 2023. DOI

Abstract

We introduce Power Bundle Adjustment as an expansion type algorithm for solving large-scale bundle adjustment problems. It is based on the power series expansion of the inverse Schur complement and constitutes a new family of solvers that we call inverse expansion methods. We theoretically justify the use of power series and we prove the convergence of our approach. Using the real-world BAL dataset we show that the proposed solver challenges the state-of-the-art iterative methods and significantly accelerates the solution of the normal equation, even for reaching a very high accuracy. This easy-to-implement solver can also complement a recently presented distributed bundle adjustment framework. We demonstrate that employing the proposed Power Bundle Adjustment as a subproblem solver significantly improves speed and accuracy of the distributed optimization.

MCML Authors

Simon Weber

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[521]

F. Wimbauer, N. Yang, C. Rupprecht and D. Cremers.
Behind the Scenes: Density Fields for Single View Reconstruction.
CVPR 2023 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada, Jun 18-23, 2023. DOI

Abstract

Inferring a meaningful geometric scene representation from a single image is a fundamental problem in computer vision. Approaches based on traditional depth map prediction can only reason about areas that are visible in the image. Currently, neural radiance fields (NeRFs) can capture true 3D including color, but are too complex to be generated from a single image. As an alternative, we propose to predict an implicit density field from a single image. It maps every location in the frustum of the image to volumetric density. By directly sampling color from the available views instead of storing color in the density field, our scene representation becomes significantly less complex compared to NeRFs, and a neural network can predict it in a single forward pass. The network is trained through self-supervision from only video data. Our formulation allows volume rendering to perform both depth prediction and novel view synthesis. Through experiments, we show that our method is able to predict meaningful geometry for regions that are occluded in the input image. Additionally, we demonstrate the potential of our approach on three datasets for depth prediction and novel-view synthesis.

MCML Authors

Felix Wimbauer

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computer Vision & Artificial Intelligence

[520]

D. Bär, N. Pröllochs and S. Feuerriegel.
Finding Qs: Profiling QAnon Supporters on Parler.
ICWSM 2023 - 17th International AAAI Conference on Web and Social Media. Limassol, Cyprus, Jun 05-08, 2023. DOI

Abstract

The social media platform ‘Parler has emerged into a prominent fringe community where a significant part of the user base are self-reported supporters of QAnon, a far-right conspiracy theory alleging that a cabal of elites controls global politics. QAnon is considered to have had an influential role in the public discourse during the 2020 U.S. presidential election. However, little is known about QAnon supporters on Parler and what sets them aside from other users. Building up on social identity theory, we aim to profile the characteristics of QAnon supporters on Parler. We analyze a large-scale dataset with more than 600,000 profiles of English-speaking users on Parler. Based on users’ profiles, posts, and comments, we then extract a comprehensive set of user features, linguistic features, network features, and content features. This allows us to perform user profiling and understand to what extent these features discriminate between QAnon and non-QAnon supporters on Parler. Our analysis is three-fold: (1) We quantify the number of QAnon supporters on Parler, finding that 34,913 users (5.5% of all users) openly report supporting the conspiracy. (2) We examine differences between QAnon vs. non-QAnon supporters. We find that QAnon supporters differ statistically significantly from non-QAnon supporters across multiple dimensions. For example, they have, on average, a larger number of followers, followees, and posts, and thus have a large impact on the Parler network. (3) We use machine learning to identify which user characteristics discriminate QAnon from non-QAnon supporters. We find that user features, linguistic features, network features, and content features, can - to a large extent - discriminate QAnon vs. non-QAnon supporters on Parler. In particular, we find that user features are highly discriminatory, followed by content features and linguistic features.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Artificial Intelligence in Management

[519]

P. Scholl, A. Bacho, H. Boche and G. Kutyniok.
The Uniqueness Problem of Physical Law Learning.
ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing. Rhode Island, Greece, Jun 04-10, 2023. DOI

Abstract

Physical law learning is the ambiguous attempt at automating the derivation of governing equations with the use of machine learning techniques. This paper shall serve as a first step to build a comprehensive theoretical framework for learning physical laws, aiming to provide reliability to according algorithms. One key problem consists in the fact that the governing equations might not be uniquely determined by the given data. We will study this problem in the common situation that a physical law is described by an ordinary or partial differential equation. For various different classes of differential equations, we provide both necessary and sufficient conditions for a function from a given function class to uniquely determine the differential equation which is governing the phenomenon. We then use our results to determine in extensive numerical experiments whether a function solves a differential equation uniquely.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[518]

Ç. Yapar, F. Jaensch, R. Levie, G. Kutyniok and G. Caire.
The First Pathloss Radio Map Prediction Challenge.
ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing. Rhode Island, Greece, Jun 04-10, 2023. DOI

Abstract

To foster research and facilitate fair comparisons among recently proposed pathloss radio map prediction methods, we have launched the ICASSP 2023 First Pathloss Radio Map Prediction Challenge. In this short overview paper, we briefly describe the pathloss prediction problem, the provided datasets, the challenge task and the challenge evaluation methodology. Finally, we present the results of the challenge.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Foundations of Artificial Intelligence

[517]

S. Kaminwar, J. Goschenhofer, J. Thomas, I. Thon and B. Bischl.
Structured Verification of Machine Learning Models in Industrial Settings.
Big Data 11.3 (Jun. 2023). DOI

Abstract

The use of machine learning (ML) allows us to automate and scale the decision-making processes. The key to this automation is the development of ML models that generalize training data toward unseen data. Such models can become extremely versatile and powerful, which makes democratization of artificial intelligence (AI) possible, that is, providing ML to non-ML experts such as software engineers or domain experts. Typically, automated ML (AutoML) is being referred to as a key step toward it. However, from our perspective, we believe that democratization of the verification process of ML systems is a larger and even more crucial challenge to achieve the democratization of AI. Currently, the process of ensuring that an ML model works as intended is unstructured. It is largely based on experience and domain knowledge that cannot be automated. The current approaches such as cross-validation or explainable AI are not enough to overcome the real challenges and are discussed extensively in this article. Arguing toward structured verification approaches, we discuss a set of guidelines to verify models, code, and data in each step of the ML lifecycle. These guidelines can help to reliably measure and select an optimal solution, besides minimizing the risk of bugs and undesired behavior in edge-cases.

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

B3 | Multimodal Perception
→ Group Stefan Leutenegger

Statistical Learning and Data Science

[516]

X. Zuo, N. Yang, N. Merrill, B. Xu and S. Leutenegger.
Incremental Dense Reconstruction from Monocular Video with Guided Sparse Feature Volume Fusion.
IEEE Robotics and Automation Letters 8.6 (Jun. 2023). DOI

Abstract

Incrementally recovering 3D dense structures from monocular videos is of paramount importance since it enables various robotics and AR applications. Feature volumes have recently been shown to enable efficient and accurate incremental dense reconstruction without the need to first estimate depth, but they are not able to achieve as high of a resolution as depth-based methods due to the large memory consumption of high-resolution feature volumes. This letter proposes a real-time feature volume-based dense reconstruction method that predicts TSDF (Truncated Signed Distance Function) values from a novel sparsified deep feature volume, which is able to achieve higher resolutions than previous feature volume-based methods, and is favorable in outdoor large-scale scenarios where the majority of voxels are empty. An uncertainty-aware multi-view stereo (MVS) network is leveraged to infer initial voxel locations of the physical surface in a sparse feature volume. Then for refining the recovered 3D geometry, deep features are attentively aggregated from multi-view images at potential surface locations, and temporally fused. Besides achieving higher resolutions than before, our method is shown to produce more complete reconstructions with finer detail in many cases. Extensive evaluations on both public and self-collected datasets demonstrate a very competitive real-time reconstruction result for our method compared to state-of-the-art reconstruction methods in both indoor and outdoor settings.

MCML Authors

Xingxing Zuo

Dr.

* Former Member

Stefan Leutenegger

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning for Robotics

[515]

M. Rezaei, A. Vahidi, B. Bischl, T. Elze and M. Eslami.
Self-supervised Learning and Self-labeling Framework for Glaucoma Detection.
Investigative Ophthalmology and Visual Science 64.8 (Jun. 2023). URL

Abstract

Purpose: Self-supervised learning methods have made a significant impact in recent years on different domains, such as natural language processing and computer vision. Here, we develop a new self-supervised framework for simultaneous retina image clustering and self-supervised representation learning to enhance the diagnosis of glaucoma.
Methods: The network is optimized using both a contrastive self-supervised network and a clustering network that clustering helps to improve the embedding representation. Our method comprises two parallel deep networks; 1) a representation network which is a self-supervised contrastive representation network that takes two augmented views of the retina image, and 2) an image clustering or self-labeling network that takes original retina images. The representation network first projects the augmented views onto an embedding space. Then it processes these representations in a multi-layer perceptron head, which generates the baseline for the pair-wise contrastive objective. On the other hand, the clustering network performs KL divergence on the top embedding layer of the representation network.
Results: We train our framework for simultaneous representation learning and self-labeling using a clustering network. We follow standard protocols by self-supervised learning for empirical analysis and evaluate the learned representation of our model by classification (Table 1), as well as image clustering tasks (Table 2) on two different Glaucoma datasets. According to the result shown in Table 1, our method improves the results of Glaucoma classification by up to 14%, better compared to SOTA self-supervised algorithm in terms of F1 score and 2% better for the task of clustering. Glaucoma-1 is composed of the labeled subset of the human retinal images used in [1]. This dataset contains 2,397 images in total, with 956 glaucoma diagnoses. While the training set for Glaucoma-2 [2] was released by the REFUGE-2 challenge.
Conclusions: We showed that combining self-supervised representation learning along with self-labeling improves the learned representation compared to the existing self-supervised learning models on retina-based glaucoma detection by up to 14% better. Moreover, our method outperformed other self-supervised methods for image clustering tasks.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[514]

T. Tornede, A. Tornede, J. Hanselle, F. Mohr, M. Wever and E. Hüllermeier.
Towards Green Automated Machine Learning: Status Quo and Future Directions.
Journal of Artificial Intelligence Research 77 (Jun. 2023). DOI

Abstract

Automated machine learning (AutoML) strives for the automatic configuration of machine learning algorithms and their composition into an overall (software) solution — a machine learning pipeline — tailored to the learning task (dataset) at hand. Over the last decade, AutoML has developed into an independent research field with hundreds of contributions. At the same time, AutoML is being criticized for its high resource consumption as many approaches rely on the (costly) evaluation of many machine learning pipelines, as well as the expensive large-scale experiments across many datasets and approaches. In the spirit of recent work on Green AI, this paper proposes Green AutoML, a paradigm to make the whole AutoML process more environmentally friendly. Therefore, we first elaborate on how to quantify the environmental footprint of an AutoML tool. Afterward, different strategies on how to design and benchmark an AutoML tool w.r.t. their “greenness”, i.e., sustainability, are summarized. Finally, we elaborate on how to be transparent about the environmental footprint and what kind of research incentives could direct the community in a more sustainable AutoML research direction. As part of this, we propose a sustainability checklist to be attached to every AutoML paper featuring all core aspects of Green AutoML.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Marcel Wever

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[513]

M. Trappmann, G.-C. Haas, S. Malich, F. Keusch, S. Bähr, F. Kreuter and S. Schwarz.
Augmenting survey data with digital trace data: Is there a threat to panel retention?
Journal of Survey Statistics and Methodology 11.3 (Jun. 2023). DOI

Abstract

Linking digital trace data to existing panel survey data may increase the overall analysis potential of the data. However, producing linked products often requires additional engagement from survey participants through consent or participation in additional tasks. Panel operators may worry that such additional requests may backfire and lead to lower panel retention, reducing the analysis potential of the data. To examine these concerns, we conducted an experiment in the German PASS panel survey after wave 11. Three quarters of panelists (n = 4,293) were invited to install a research app and to provide sensor data over a period of 6 months, while one quarter (n = 1,428) did not receive an invitation. We find that the request to install a smartphone app and share data significantly decreases panel retention in the wave immediately following the invitation by 3.3 percentage points. However, this effect wears off and is no longer significant in the second and third waves after the invitation. We conclude that researchers who run panel surveys have to take moderate negative effects on retention into account but that the potential gain likely outweighs these moderate losses.

MCML Authors

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[512]

M. Lotfollahi, A. K. Susmelj, C. De Donno, L. Hetzel, Y. Ji, I. L. Ibarra, S. R. Srivatsan, M. Naghipourfar, R. M. Daza, B. Martin, J. Shendure, J. L. McFaline‐Figueroa, P. Boyeau, F. A. Wolf, N. Yakubova, S. Günnemann, C. Trapnell, D. Lopez‐Paz and F. J. Theis.
Predicting cellular responses to complex perturbations in high‐throughput screens.
Molecular Systems Biology 19.e11517 (Jun. 2023). DOI

Abstract

Recent advances in multiplexed single‐cell transcriptomics experiments facilitate the high‐throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible. Therefore, computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep‐learning approaches for single‐cell response modeling. CPA learns to in silico predict transcriptional perturbation response at the single‐cell level for unseen dosages, cell types, time points, and species. Using newly generated single‐cell drug combination data, we validate that CPA can predict unseen drug combinations while outperforming baseline models. Additionally, the architecture’s modularity enables incorporating the chemical representation of the drugs, allowing the prediction of cellular response to completely unseen drugs. Furthermore, CPA is also applicable to genetic combinatorial screens. We demonstrate this by imputing in silico 5,329 missing combinations (97.6% of all possibilities) in a single‐cell Perturb‐seq experiment with diverse genetic interactions. We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling in silico response prediction at the single‐cell level and thus accelerate therapeutic applications using single‐cell technologies.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[511]

L. Haliburton, S. Kheirinejad, A. Schmidt and S. Mayer.
Exploring Smart Standing Desks to Foster a Healthier Workplace.
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7.2 (Jun. 2023). DOI

Abstract

Sedentary behavior is endemic in modern workplaces, contributing to negative physical and mental health outcomes. Although adjustable standing desks are increasing in popularity, people still avoid standing. We developed an open-source plug-and-play system to remotely control standing desks and investigated three system modes with a three-week in-the-wild user study (N=15). Interval mode forces users to stand once per hour, causing frustration. Adaptive mode nudges users to stand every hour unless the user has stood already. Smart mode, which raises the desk during breaks, was the best rated, contributing to increased standing time with the most positive qualitative feedback. However, non-computer activities need to be accounted for in the future. Therefore, our results indicate that a smart standing desk that shifts modes at opportune times has the most potential to reduce sedentary behavior in the workplace. We contribute our open-source system and insights for future intelligent workplace well-being systems.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

* Former Member

[510]

K. Riedl, T. Klock, C. Geldhauser and M. Fornasier.
Gradient is All You Need?
Preprint (Jun. 2023). arXiv

Abstract

In this paper we provide a novel analytical perspective on the theoretical understanding of gradient-based learning algorithms by interpreting consensus-based optimization (CBO), a recently proposed multi-particle derivative-free optimization method, as a stochastic relaxation of gradient descent. Remarkably, we observe that through communication of the particles, CBO exhibits a stochastic gradient descent (SGD)-like behavior despite solely relying on evaluations of the objective function. The fundamental value of such link between CBO and SGD lies in the fact that CBO is provably globally convergent to global minimizers for ample classes of nonsmooth and nonconvex objective functions, hence, on the one side, offering a novel explanation for the success of stochastic relaxations of gradient descent. On the other side, contrary to the conventional wisdom for which zero-order methods ought to be inefficient or not to possess generalization abilities, our results unveil an intrinsic gradient descent nature of such heuristics. This viewpoint furthermore complements previous insights into the working principles of CBO, which describe the dynamics in the mean-field limit through a nonlinear nonlocal partial differential equation that allows to alleviate complexities of the nonconvex function landscape. Our proofs leverage a completely nonsmooth analysis, which combines a novel quantitative version of the Laplace principle (log-sum-exp trick) and the minimizing movement scheme (proximal iteration). In doing so, we furnish useful and precise insights that explain how stochastic perturbations of gradient descent overcome energy barriers and reach deep levels of nonconvex functions. Instructive numerical illustrations support the provided theoretical insights.

MCML Authors

Konstantin Riedl

Dr.

* Former Member

Carina Geldhauser

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

* Former Member

Massimo Fornasier

Prof. Dr.

Applied Numerical Analysis

[509]

J. Sommer, L. Hetzel, D. Lüdke, F. J. Theis and S. Günnemann.
The power of motifs as inductive bias for learning molecular distributions.
Preprint (Jun. 2023). arXiv

Abstract

Machine learning for molecules holds great potential for efficiently exploring the vast chemical space and thus streamlining the drug discovery process by facilitating the design of new therapeutic molecules. Deep generative models have shown promising results for molecule generation, but the benefits of specific inductive biases for learning distributions over small graphs are unclear. Our study aims to investigate the impact of subgraph structures and vocabulary design on distribution learning, using small drug molecules as a case study. To this end, we introduce Subcover, a new subgraph-based fragmentation scheme, and evaluate it through a two-step variational auto-encoder. Our results show that Subcover’s improved identification of chemically meaningful subgraphs leads to a relative improvement of the FCD score by 30%, outperforming previous methods. Our findings highlight the potential of Subcover to enhance the performance and scalability of existing methods, contributing to the advancement of drug discovery.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Data Analytics & Machine Learning

[508]

V. Steinborn, A. Maronikolakis and H. Schütze.
Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models.
Preprint (Jun. 2023). arXiv

Abstract

In efforts to keep up with the rapid progress and use of large language models, gender bias research is becoming more prevalent in NLP. Non-English bias research, however, is still in its infancy with most work focusing on English. In our work, we study how grammatical gender bias relating to politeness levels manifests in Japanese and Korean language models. Linguistic studies in these languages have identified a connection between gender bias and politeness levels, however it is not yet known if language models reproduce these biases. We analyze relative prediction probabilities of the male and female grammatical genders using templates and find that informal polite speech is most indicative of the female grammatical gender, while rude and formal speech is most indicative of the male grammatical gender. Further, we find politeness levels to be an attack vector for allocational gender bias in cyberbullying detection models. Cyberbullies can evade detection through simple techniques abusing politeness levels. We introduce an attack dataset to (i) identify representational gender bias across politeness levels, (ii) demonstrate how gender biases can be abused to bypass cyberbullying detection models and (iii) show that allocational biases can be mitigated via training on our proposed dataset. Through our findings we highlight the importance of bias research moving beyond its current English-centrism.

MCML Authors

Victor Steinborn

* Former Member

Antonis Maronikolakis

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[507]

J. W. Grootjen, H. Weingärtner and S. Mayer.
Highlighting the Challenges of Blinks in Eye Tracking for Interactive Systems.
PETMEI @ETRA 2023 - 8th International Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction at the ACM Symposium on Eye Tracking Research and Applications (ETRA 2023). Tübingen, Germany, May 30-Jun 02, 2023. DOI

Abstract

Eye tracking is the basis for many intelligent systems to predict user actions. A core challenge with eye-tracking data is that it inherently suffers from missing data due to blinks. Approaches such as intent prediction and user state recognition process gaze data using neural networks; however, they often have difficulty handling missing information. In an effort to understand how prior work dealt with missing data, we found that researchers often simply ignore missing data or adopt use-case-specific approaches, such as artificially filling in missing data. This inconsistency in handling missing data in eye tracking hinders the development of effective intelligent systems for predicting user actions and limits reproducibility. Furthermore, this can even lead to incorrect results. Thus, this lack of standardization calls for investigating possible solutions to improve the consistency and effectiveness of processing eye-tracking data for user action prediction.

MCML Authors

Jesse Grootjen

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

* Former Member

[506]

J. Moosbauer.
Towards explainable automated machine learning.
Dissertation 2023. DOI

Abstract

This thesis explores the intersection of Automated Machine Learning (AutoML) and explainable AI, addressing the need for transparency at multiple levels: the model, the learning algorithm, and the AutoML system itself. The work develops methods for enhancing model explainability through multi-objective hyperparameter optimization (HPO) and introduces new techniques to understand the effects of hyperparameters and optimizers within AutoML systems. These contributions advance the field by providing more interpretable and reliable tools for AutoML, ultimately increasing the accessibility and trustworthiness of machine learning models and their deployment. (Shortened.)

MCML Authors

Julia Moosbauer

Dr.

* Former Member

[505]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Cascaded Latent Diffusion Models for High-Resolution Chest X-ray Synthesis.
PAKDD 2023 - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka, Japan, May 25-28, 2023. DOI

Abstract

While recent advances in large-scale foundational models show promising results, their application to the medical domain has not yet been explored in detail. In this paper, we progress into the realms of large-scale modeling in medical synthesis by proposing Cheff - a foundational cascaded latent diffusion model, which generates highly-realistic chest radiographs providing state-of-the-art quality on a 1-megapixel scale. We further propose MaCheX, which is a unified interface for public chest datasets and forms the largest open collection of chest X-rays up to date. With Cheff conditioned on radiological reports, we further guide the synthesis process over text prompts and unveil the research area of report-to-chest-X-ray generation.

MCML Authors

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[504]

D. Winkel, N. Strauß, M. Schubert, Y. Ma and T. Seidl.
Constrained Portfolio Management using Action Space Decomposition for Reinforcement Learning.
PAKDD 2023 - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka, Japan, May 25-28, 2023. DOI

Abstract

Financial portfolio managers typically face multi-period optimization tasks such as short-selling or investing at least a particular portion of the portfolio in a specific industry sector. A common approach to tackle these problems is to use constrained Markov decision process (CMDP) methods, which may suffer from sample inefficiency, hyperparameter tuning, and lack of guarantees for constraint violations. In this paper, we propose Action Space Decomposition Based Optimization (ADBO) for optimizing a more straightforward surrogate task that allows actions to be mapped back to the original task. We examine our method on two real-world data portfolio construction tasks. The results show that our new approach consistently outperforms state-of-the-art benchmark approaches for general CMDPs.

MCML Authors

David Winkel

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Database Systems and Data Mining

[503]

V. Blaschke, H. Schütze and B. Plank.
A Survey of Corpora for Germanic Low-Resource Languages and Dialects.
NoDaLiDa 2023 - 24th Nordic Conference on Computational Linguistics. Tórshavn, Faroe Islands, May 22-24, 2023. URL

Abstract

Despite much progress in recent years, the vast majority of work in natural language processing (NLP) is on standard languages with many speakers. In this work, we instead focus on low-resource languages and in particular non-standardized low-resource languages. Even within branches of major language families, often considered well-researched, little is known about the extent and type of available resources and what the major NLP challenges are for these language varieties. The first step to address this situation is a systematic survey of available corpora (most importantly, annotated corpora, which are particularly valuable for NLP research). Focusing on Germanic low-resource language varieties, we provide such a survey in this paper. Except for geolocation (origin of speaker or document), we find that manually annotated linguistic resources are sparse and, if they exist, mostly cover morphosyntax. Despite this lack of resources, we observe that interest in this area is increasing: there is active development and a growing research community. To facilitate research, we make our overview of over 80 corpora publicly available.

MCML Authors

Verena Blaschke

AI and Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[502]

A.-K. Wickert, C. Damke, L. Baumgärtner, E. Hüllermeier and M. Mezini.
UnGoML: Automated Classification of unsafe Usages in Go.
MSR 2023 - IEEE/ACM 20th International Conference on Mining Software Repositories. Melbourne, Australia, May 15-16, 2023. FOSS (Free, Open Source Software) Impact Paper Award. DOI GitHub

Abstract

The Go programming language offers strong protection from memory corruption. As an escape hatch of these protections, it provides the unsafe package. Previous studies identified that this unsafe package is frequently used in real-world code for several purposes, e.g., serialization or casting types. Due to the variety of these reasons, it may be possible to refactor specific usages to avoid potential vulnerabilities. However, the classification of unsafe usages is challenging and requires the context of the call and the program’s structure. In this paper, we present the first automated classifier for unsafe usages in Go, UnGoML, to identify what is done with the unsafe package and why it is used. For UnGoML, we built four custom deep learning classifiers trained on a manually labeled data set. We represent Go code as enriched control-flow graphs (CFGs) and solve the label prediction task with one single-vertex and three context-aware classifiers. All three context-aware classifiers achieve a top-1 accuracy of more than 86% for both dimensions, WHAT and WHY. Furthermore, in a set-valued conformal prediction setting, we achieve accuracies of more than 93% with mean label set sizes of 2 for both dimensions. Thus, UnGoML can be used to efficiently filter unsafe usages for use cases such as refactoring or a security audit.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[501]

A. Mittermeier.
Robust evaluation of contrast-enhanced imaging for perfusion quantification.
Dissertation 2023. DOI

Abstract

This thesis advances the quantification and prediction of hemodynamic parameters in dynamic contrast-enhanced (DCE) imaging through two innovative approaches. The Bayesian Tofts model (BTM) improves the reliability and uncertainty estimation of perfusion parameters, demonstrating its potential for enhanced treatment response assessment in cancer care. Additionally, the development of a deep learning model offers a promising alternative by directly predicting clinical endpoints from raw DCE-CT data, eliminating the need for traditional tracer-kinetic modeling and paving the way for more efficient and accurate clinical applications in stroke and other conditions. (Shortened.)

MCML Authors

Andreas Mittermeier

Dr.

B3 | Multimodal Perception
→ Group Matthias Althoff

Clinical Data Science in Radiology

[500]

T. Ladner and M. Althoff.
Automatic Abstraction Refinement in Neural Network Verification Using Sensitivity Analysis.
HSCC 2023 - 26th ACM International Conference on Hybrid Systems: Computation and Control. San Antonio, TX, USA, May 09-12, 2023. DOI

Abstract

The formal verification of neural networks is essential for their application in safety-critical environments. However, the set-based verification of neural networks using linear approximations often obtains overly conservative results, while nonlinear approximations quickly become computationally infeasible in deep neural networks. We address this issue for the first time by automatically balancing between precision and computation time without splitting the propagated set. Our work introduces a novel automatic abstraction refinement approach using sensitivity analysis to iteratively reduce the abstraction error at the neuron level until either the specifications are met or a maximum number of iterations is reached. Our evaluation shows that we can tightly over-approximate the output sets of deep neural networks and that our approach is up to a thousand times faster than a naive approach. We further demonstrate the applicability of our approach in closed-loop settings.

MCML Authors

Tobias Ladner

Cyber Physical Systems

Matthias Althoff

Prof. Dr.

Cyber Physical Systems

[499]

V. Ehm, D. Cremers and F. Bernard.
Non-Separable Multi-Dimensional Network Flows for Visual Computing.
EG 2023 - Poster at the 44th Annual Conference of the European Association for Computer Graphics. Saarbrücken, Germany, May 08-12, 2023. DOI

Abstract

Flows in networks (or graphs) play a significant role in numerous computer vision tasks. The scalar-valued edges in these graphs often lead to a loss of information and thereby to limitations in terms of expressiveness. For example, oftentimes highdimensional data (e.g. feature descriptors) are mapped to a single scalar value (e.g. the similarity between two feature descriptors). To overcome this limitation, we propose a novel formalism for non-separable multi-dimensional network flows. By doing so, we enable an automatic and adaptive feature selection strategy - since the flow is defined on a per-dimension basis, the maximizing flow automatically chooses the best matching feature dimensions. As a proof of concept, we apply our formalism to the multi-object tracking problem and demonstrate that our approach outperforms scalar formulations on the MOT16 benchmark in terms of robustness to noise.

MCML Authors

Viktoria Ehm

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computer Vision & Artificial Intelligence

[498]

X. Wang, L. Weissweiler, H. Schütze and B. Plank.
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives.
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics. Dubrovnik, Croatia, May 02-06, 2023. DOI

Abstract

Recently, various intermediate layer distillation (ILD) objectives have been shown to improve compression of BERT models via Knowledge Distillation (KD). However, a comprehensive evaluation of the objectives in both task-specific and task-agnostic settings is lacking. To the best of our knowledge, this is the first work comprehensively evaluating distillation objectives in both settings. We show that attention transfer gives the best performance overall. We also study the impact of layer choice when initializing the student from the teacher layers, finding a significant impact on the performance in task-specific distillation. For vanilla KD and hidden states transfer, initialisation with lower layers of the teacher gives a considerable improvement over higher layers, especially on the task of QNLI (up to an absolute percentage change of 17.8 in accuracy). Attention transfer behaves consistently under different initialisation settings. We release our code as an efficient transformer-based model distillation framework for further studies.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Leonie Weissweiler

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

AI and Computational Linguistics

[497]

A. Chronopoulou, M. Peters, A. Fraser and J. Dodge.
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models.
EACL 2023 - Findings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. Dubrovnik, Croatia, May 02-06, 2023. DOI

Abstract

Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel domain at test time. In this paper, we introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains. Our approach is embarrassingly parallel: first, we train a set of domain-specific adapters; then, for each novel domain, we determine which adapters should be averaged at test time. We present extensive experiments showing that AdapterSoup consistently improves performance to new domains without extra training. We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results. We explore various approaches for choosing which adapters to combine, such as text clustering and semantic similarity. We find that using clustering leads to the most competitive results on novel domains.

MCML Authors

Alexandra Chronopoulou

Dr.

* Former Member

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

[496]

A. Chronopoulou, D. Stojanovski and A. Fraser.
Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation.
LoResMT @EACL 2023 - 6th Workshop on Technologies for Machine Translation of Low-Resource Languages at the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023). Dubrovnik, Croatia, May 02-06, 2023. DOI

Abstract

Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks. Self-supervised pretrained models are often fine-tuned on parallel data from one or multiple language pairs for machine translation. Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive. Training a new adapter on each language pair or training a single adapter on all language pairs without updating the pretrained model has been proposed as a parameter-efficient alternative. However, the former does not permit any sharing between languages, while the latter shares parameters for all languages and is susceptible to negative interference. In this paper, we propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer. Our approach outperforms related baselines, yielding higher translation scores on average when translating from English to 17 different low-resource languages. We also show that language-family adapters provide an effective method to translate to languages unseen during pretraining.

MCML Authors

Alexandra Chronopoulou

Dr.

* Former Member

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Data Analytics & Statistics

[495]

V. Blaschke, H. Schütze and B. Plank.
Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages.
VarDial @EACL 2023 - 10th Workshop on NLP for Similar Languages, Varieties and Dialects at the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023). Dubrovnik, Croatia, May 02-06, 2023. DOI

Abstract

One of the challenges with finetuning pretrained language models (PLMs) is that their tokenizer is optimized for the language(s) it was pretrained on, but brittle when it comes to previously unseen variations in the data. This can for instance be observed when finetuning PLMs on one language and evaluating them on data in a closely related language variety with no standardized orthography. Despite the high linguistic similarity, tokenization no longer corresponds to meaningful representations of the target data, leading to low performance in, e.g., part-of-speech tagging. In this work, we finetune PLMs on seven languages from three different families and analyze their zero-shot performance on closely related, non-standardized varieties. We consider different measures for the divergence in the tokenization of the source and target data, and the way they can be adjusted by manipulating the tokenization during the finetuning step. Overall, we find that the similarity between the percentage of words that get split into subwords in the source and target data (the split word ratio difference) is the strongest predictor for model performance on target data.

MCML Authors

Verena Blaschke

AI and Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Barbara Plank

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

AI and Computational Linguistics

[494]

H. Huang, J. Qiu and K. Riedl.
On the global convergence of particle swarm optimization methods.
Applied Mathematics and Optimization 88.2 (May. 2023). DOI

Abstract

In this paper we provide a rigorous convergence analysis for the renowned particle swarm optimization method by using tools from stochastic calculus and the analysis of partial differential equations. Based on a continuous-time formulation of the particle dynamics as a system of stochastic differential equations, we establish convergence to a global minimizer of a possibly nonconvex and nonsmooth objective function in two steps. First, we prove consensus formation of an associated mean-field dynamics by analyzing the time-evolution of the variance of the particle distribution, which acts as Lyapunov function of the dynamics. We then show that this consensus is close to a global minimizer by employing the asymptotic Laplace principle and a tractability condition on the energy landscape of the objective function. These results allow for the usage of memory mechanisms, and hold for a rich class of objectives provided certain conditions of well-preparation of the hyperparameters and the initial datum. In a second step, at least for the case without memory effects, we provide a quantitative result about the mean-field approximation of particle swarm optimization, which specifies the convergence of the interacting particle system to the associated mean-field limit. Combining these two results allows for global convergence guarantees of the numerical particle swarm optimization method with provable polynomial complexity. To demonstrate the applicability of the method we propose an efficient and parallelizable implementation, which is tested in particular on a competitive and well-understood high-dimensional benchmark problem in machine learning.

MCML Authors

Konstantin Riedl

Dr.

* Former Member

[493]

K. Rath, D. Rügamer, B. Bischl, U. von Toussaint and C. G. Albert.
Dependent state space Student-t processes for imputation and data augmentation in plasma diagnostics.
Contributions to Plasma Physics 63.5-6 (May. 2023). DOI

Abstract

Multivariate time series measurements in plasma diagnostics present several challenges when training machine learning models: the availability of only a few labeled data increases the risk of overfitting, and missing data points or outliers due to sensor failures pose additional difficulties. To overcome these issues, we introduce a fast and robust regression model that enables imputation of missing points and data augmentation by massive sampling while exploiting the inherent correlation between input signals. The underlying Student-t process allows for a noise distribution with heavy tails and thus produces robust results in the case of outliers. We consider the state space form of the Student-t process, which reduces the computational complexity and makes the model suitable for high-resolution time series. We evaluate the performance of the proposed method using two test cases, one of which was inspired by measurements of flux loop signals.

MCML Authors

Katharina Röck (née Rath)

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Statistical Learning and Data Science

[492]

D. Frauen and S. Feuerriegel.
Estimating individual treatment effects under unobserved confounding using binary instruments.
ICLR 2023 - 11th International Conference on Learning Representations. Kigali, Rwanda, May 01-05, 2023. URL

Abstract

Estimating conditional average treatment effects (CATEs) from observational data is relevant in many fields such as personalized medicine. However, in practice, the treatment assignment is usually confounded by unobserved variables and thus introduces bias. A remedy to remove the bias is the use of instrumental variables (IVs). Such settings are widespread in medicine (e.g., trials where the treatment assignment is used as binary IV). In this paper, we propose a novel, multiply robust machine learning framework, called MRIV, for estimating CATEs using binary IVs and thus yield an unbiased CATE estimator. Different from previous work for binary IVs, our framework estimates the CATE directly via a pseudo outcome regression. (1)~We provide a theoretical analysis where we show that our framework yields multiple robust convergence rates: our CATE estimator achieves fast convergence even if several nuisance estimators converge slowly. (2)~We further show that our framework asymptotically outperforms state-of-the-art plug-in IV methods for CATE estimation, in the sense that it achieves a faster rate of convergence if the CATE is smoother than the individual outcome surfaces. (3)~We build upon our theoretical results and propose a tailored deep neural network architecture called MRIV-Net for CATE estimation using binary IVs. Across various computational experiments, we demonstrate empirically that our MRIV-Net achieves state-of-the-art performance. To the best of our knowledge, our MRIV is the first multiply robust machine learning framework tailored to estimating CATEs in the binary IV setting.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Artificial Intelligence in Management

[491]

R. Paolino, A. Bojchevski, S. Günnemann, G. Kutyniok and R. Levie.
Unveiling the Sampling Density in Non-Uniform Geometric Graphs.
ICLR 2023 - 11th International Conference on Learning Representations. Kigali, Rwanda, May 01-05, 2023. URL

Abstract

A powerful framework for studying graphs is to consider them as geometric graphs: nodes are randomly sampled from an underlying metric space, and any pair of nodes is connected if their distance is less than a specified neighborhood radius. Currently, the literature mostly focuses on uniform sampling and constant neighborhood radius. However, real-world graphs are likely to be better represented by a model in which the sampling density and the neighborhood radius can both vary over the latent space. For instance, in a social network communities can be modeled as densely sampled areas, and hubs as nodes with larger neighborhood radius. In this work, we first perform a rigorous mathematical analysis of this (more general) class of models, including derivations of the resulting graph shift operators. The key insight is that graph shift operators should be corrected in order to avoid potential distortions introduced by the non-uniform sampling. Then, we develop methods to estimate the unknown sampling density in a self-supervised fashion. Finally, we present exemplary applications in which the learnt density is used to 1) correct the graph shift operator and improve performance on a variety of tasks, 2) improve pooling, and 3) extract knowledge from networks. Our experimental findings support our theory and provide strong evidence for our model.

MCML Authors

Raffaele Paolino

Mathematical Foundations of Artificial Intelligence

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Gitta Kutyniok

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Foundations of Artificial Intelligence

[490]

T. Pielok, B. Bischl and D. Rügamer.
Approximate Bayesian Inference with Stein Functional Variational Gradient Descent.
ICLR 2023 - 11th International Conference on Learning Representations. Kigali, Rwanda, May 01-05, 2023. URL

Abstract

We propose a general-purpose variational algorithm that forms a natural analogue of Stein variational gradient descent (SVGD) in function space. While SVGD successively updates a set of particles to match a target density, the method introduced here of Stein functional variational gradient descent (SFVGD) updates a set of particle functions to match a target stochastic process (SP). The update step is found by minimizing the functional derivative of the Kullback-Leibler divergence between SPs. SFVGD can either be used to train Bayesian neural networks (BNNs) or for ensemble gradient boosting. We show the efficacy of training BNNs with SFVGD on various real-world datasets.

MCML Authors

Tobias Pielok

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[489]

Z. Liu, Y. Ma, M. Schubert, Y. Ouyang, W. Rong and Z. Xiong.
Multimodal Contrastive Transformer for Explainable Recommendation.
IEEE Transactions on Computational Social Systems (May. 2023). DOI

Abstract

Explanations play an essential role in helping users evaluate results from recommender systems. Various natural language generation methods have been proposed to generate explanations for the recommendation. However, they usually suffer from two problems. First, since user-provided review text contains noisy data, the generated explanations may be irrelevant to the recommended items. Second, as lacking some supervision signals, most of the generated sentences are similar, which cannot meet the diversity and personalized needs of users. To tackle these problems, we propose a multimodal contrastive transformer (MMCT) model for an explainable recommendation, which incorporates multimodal information into the learning process, including sentiment features, item features, item images, and refined user reviews. Meanwhile, we propose a dynamic fusion mechanism during the decoding stage, which generates supervision signals to guide the explanation generation. Additionally, we develop a contrastive objective to generate diverse explainable texts. Comprehensive experiments on two real-world datasets show that the proposed model outperforms comparable explainable recommendation baselines in terms of explanation performance and recommendation performance. Efficiency analysis and robustness analysis verify the advantages of the proposed model. While ablation analysis establishes the relative contributions of the respective components and various modalities, the case study shows the working of our model from an intuitive sense.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[488]

N. Banholzer, T. Mellan, H. J. T. Unwin, S. Feuerriegel, S. Mishra and S. Bhatt.
A comparison of short-term probabilistic forecasts for the incidence of COVID-19 using mechanistic and statistical time series models.
Preprint (May. 2023). arXiv

Abstract

Short-term forecasts of infectious disease spread are a critical component in risk evaluation and public health decision making. While different models for short-term forecasting have been developed, open questions about their relative performance remain. Here, we compare short-term probabilistic forecasts of popular mechanistic models based on the renewal equation with forecasts of statistical time series models. Our empirical comparison is based on data of the daily incidence of COVID-19 across six large US states over the first pandemic year. We find that, on average, probabilistic forecasts from statistical time series models are overall at least as accurate as forecasts from mechanistic models. Moreover, statistical time series models better capture volatility. Our findings suggest that domain knowledge, which is integrated into mechanistic models by making assumptions about disease dynamics, does not improve short-term forecasts of disease incidence. We note, however, that forecasting is often only one of many objectives and thus mechanistic models remain important, for example, to model the impact of vaccines or the emergence of new variants.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[487]

H.-H. Chou, H. Rauhut and R. Ward.
Robust implicit regularization via weight normalization.
Preprint (May. 2023). arXiv

Abstract

Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice. However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance. In this paper, we aim to close this gap by incorporating and analyzing gradient flow (continuous-time version of gradient descent) with weight normalization, where the weight vector is reparameterized in terms of polar coordinates, and gradient flow is applied to the polar coordinates. By analyzing key invariants of the gradient flow and using Lojasiewicz Theorem, we show that weight normalization also has an implicit bias towards sparse solutions in the diagonal linear model, but that in contrast to plain gradient flow, weight normalization enables a robust bias that persists even if the weights are initialized at practically large scale. Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization in overparameterized diagonal linear network models.

MCML Authors

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[486]

H. N. Dang, V. Golkov, T. Wimmer, D. Cremers, A. Maier and M. Zaiss.
Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolution.
Preprint (May. 2023). arXiv

Abstract

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Vision & Artificial Intelligence

[485]

Y. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response.
Preprint (May. 2023). arXiv

Abstract

LLMs (large language models) such as ChatGPT have shown remarkable language understanding and generation capabilities. Although reference-free evaluators based on LLMs show better human alignment than traditional reference-based evaluators, there are many challenges in using reference-free evaluators based on LLMs. Reference-free evaluators are more suitable for open-ended examples with different semantics responses. But not all examples are open-ended. For closed-ended examples with unique correct semantic response, reference-free evaluators will still consider it high quality when giving a response that is inconsistent with the facts and the semantic of reference. In order to comprehensively evaluate the reliability of evaluators based on LLMs, we construct two adversarial meta-evaluation dialogue generation datasets KdConv-ADV and DSTC7-ADV based on KdConv and DSTC7-AVSD, respectively. Compared to previous meta-evaluation benchmarks, KdConv-ADV and DSTC7-ADV are much more challenging since they requires evaluators to be able to reasonably evaluate closed-ended examples with the help of external knowledge or even its own knowledge. Empirical results show that the ability of LLMs to identify unreasonable responses is insufficient. There are risks in using eference-free evaluators based on LLMs to evaluate the quality of dialogue responses.

MCML Authors

Yongkang Liu

* Former Member

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[484]

A. Modarressi, A. Imani, M. Fayyaz and H. Schütze.
RET-LLM: Towards a General Read-Write Memory for Large Language Models.
Preprint (May. 2023). arXiv

Abstract

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP) through their extensive parameters and comprehensive data utilization. However, existing LLMs lack a dedicated memory unit, limiting their ability to explicitly store and retrieve knowledge for various tasks. In this paper, we propose RET-LLM a novel framework that equips LLMs with a general write-read memory unit, allowing them to extract, store, and recall knowledge from the text as needed for task performance. Inspired by Davidsonian semantics theory, we extract and save knowledge in the form of triplets. The memory unit is designed to be scalable, aggregatable, updatable, and interpretable. Through qualitative evaluations, we demonstrate the superiority of our proposed framework over baseline approaches in question answering tasks. Moreover, our framework exhibits robust performance in handling temporal-based question answering tasks, showcasing its ability to effectively manage time-dependent information.

MCML Authors

Ali Modarressi

Computational Linguistics

Ayyoob Imani

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[483]

H. Ye, Y. Liu and H. Schütze.
A study of conceptual language similarity: comparison and evaluation.
Preprint (May. 2023). arXiv

Abstract

An interesting line of research in natural language processing (NLP) aims to incorporate linguistic typology to bridge linguistic diversity and assist the research of low-resource languages. While most works construct linguistic similarity measures based on lexical or typological features, such as word order and verbal inflection, recent work has introduced a novel approach to defining language similarity based on how they represent basic concepts, which is complementary to existing similarity measures. In this work, we study the conceptual similarity in detail and evaluate it extensively on a binary classification task.

MCML Authors

Haotian Ye

Computational Linguistics

Yihong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computational Linguistics

[482]

D. Bär, F. Calderon, M. Lawlor, S. Licklederer, M. Totzauer and S. Feuerriegel.
Analyzing Social Media Activities at Bellingcat.
WebSci 2023 - 15th ACM Web Science Conference 2023. Austin, TX, USA, Apr 30-May 01, 2023. DOI

Abstract

Open-source journalism emerged as a new phenomenon in the media ecosystem, which uses crowdsourcing to fact-check and generate investigative reports for world events using open sources (e.g., social media). A particularly prominent example is Bellingcat. Bellingcat is known for its investigations on the illegal use of chemical weapons during the Syrian war, the Russian responsibility for downing flight MH17, the identification of the perpetrators in the attempted murder of Alexei Navalny, and war crimes in the Russo-Ukraine war. Crucial for this is social media in order to disseminate findings and crowdsource fact-checks. In this work, we characterize the social media activities at Bellingcat on Twitter. For this, we built a comprehensive dataset of all N=24,682 tweets posted by Bellingcat on Twitter since its inception in July 2014. Our analysis is three-fold: (1) We analyze how Bellingcat uses Twitter to disseminate information and collect information from its follower base. Here, we find a steady increase in both posts and replies over time, particularly during the Russo-Ukrainian war, which is in line with the growing importance of Bellingcat for the traditional media ecosystem. (2) We identify characteristics of posts that are successful in eliciting user engagement. User engagement is particularly large for posts embedding additional media items and with a more negative sentiment. (3) We examine how the follower base has responded to the Russian invasion of Ukraine. Here, we find that the sentiment has become more polarized and negative. We attribute this to a ~13-fold increase in bots interacting with the Bellingcat account. Overall, our findings provide recommendations for how open-source journalism such as Bellingcat can successfully operate on social media.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[481]

L. G. M. Bauer, C. Leiber, C. Böhm and C. Plant.
Extension of the Dip-test Repertoire - Efficient and Differentiable p-value Calculation for Clustering.
SDM 2023 - SIAM International Conference on Data Mining. Minneapolis, MN, USA, Apr 27-29, 2023. DOI

Abstract

Over the last decade, the Dip-test of unimodality has gained increasing interest in the data mining community as it is a parameter-free statistical test that reliably rates the modality in one-dimensional samples. It returns a so called Dip-value and a corresponding probability for the sample’s unimodality (Dip-p-value). These two values share a sigmoidal relationship. However, the specific transformation is dependent on the sample size. Many Dip-based clustering algorithms use bootstrapped look-up tables translating Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a specifically designed sigmoid function as a substitute for these state-of-the-art look-up tables. This accelerates computation and provides an approximation of the Dip- to Dip-p-value transformation for every single sample size. Further, it is differentiable and can therefore easily be integrated in learning schemes using gradient descent. We showcase this by exploiting our function in a novel subspace clustering algorithm called Dip’n’Sub. We highlight in extensive experiments the various benefits of our proposal.

MCML Authors

Collin Leiber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Christian Böhm

Prof. Dr.

* Former Member

[480]

E. Dorigatti, B. Schubert, B. Bischl and D. Rügamer.
Frequentist Uncertainty Quantification in Semi-Structured Neural Networks.
AISTATS 2023 - 26th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, Apr 25-27, 2023. URL

Abstract

Semi-structured regression (SSR) models jointly learn the effect of structured (tabular) and unstructured (non-tabular) data through additive predictors and deep neural networks (DNNs), respectively. Inference in SSR models aims at deriving confidence intervals for the structured predictor, although current approaches ignore the variance of the DNN estimation of the unstructured effects. This results in an underestimation of the variance of the structured coefficients and, thus, an increase of Type-I error rates. To address this shortcoming, we present here a theoretical framework for structured inference in SSR models that incorporates the variance of the DNN estimate into confidence intervals for the structured predictor. By treating this estimate as a random offset with known variance, our formulation is agnostic to the specific deep uncertainty quantification method employed. Through numerical experiments and a practical application on a medical dataset, we show that our approach results in increased coverage of the true structured coefficients and thus a reduction in Type-I error rate compared to ignoring the variance of the neural network, naive ensembling of SSR models, and a variational inference baseline.

MCML Authors

Emilio Dorigatti

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Statistics, Data Science and Machine Learning

[479]

G. Keropyan, D. Strieder and M. Drton.
Rank-Based Causal Discovery for Post-Nonlinear Models.
AISTATS 2023 - 26th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, Apr 25-27, 2023. URL

Abstract

Learning causal relationships from empirical observations is a central task in scientific research. A common method is to employ structural causal models that postulate noisy functional relations among a set of interacting variables. To ensure unique identifiability of causal directions, researchers consider restricted subclasses of structural causal models. Post-nonlinear (PNL) causal models constitute one of the most flexible options for such restricted subclasses, containing in particular the popular additive noise models as a further subclass. However, learning PNL models is not well studied beyond the bivariate case. The existing methods learn non-linear functional relations by minimizing residual dependencies and subsequently test independence from residuals to determine causal orientations. However, these methods can be prone to overfitting and, thus, difficult to tune appropriately in practice. As an alternative, we propose a new approach for PNL causal discovery that uses rank-based methods to estimate the functional parameters. This new approach exploits natural invariances of PNL models and disentangles the estimation of the non-linear functions from the independence tests used to find causal orientations. We prove consistency of our method and validate our results in numerical experiments.

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[478]

C. Luther, G. König and M. Grosse-Wentrup.
Efficient SAGE Estimation via Causal Structure Learning.
AISTATS 2023 - 26th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, Apr 25-27, 2023. URL

Abstract

The Shapley Additive Global Importance (SAGE) value is a theoretically appealing interpretability method that fairly attributes global importance to a model’s features. However, its exact calculation requires the computation of the feature’s surplus performance contributions over an exponential number of feature sets. This is computationally expensive, particularly because estimating the surplus contributions requires sampling from conditional distributions. Thus, SAGE approximation algorithms only take a fraction of the feature sets into account. We propose $d$-SAGE, a method that accelerates SAGE approximation. $d$-SAGE is motivated by the observation that conditional independencies (CIs) between a feature and the model target imply zero surplus contributions, such that their computation can be skipped. To identify CIs, we leverage causal structure learning (CSL) to infer a graph that encodes (conditional) independencies in the data as $d$-separations. This is computationally more efficient because the expense of the one-time graph inference and the $d$-separation queries is negligible compared to the expense of surplus contribution evaluations. Empirically we demonstrate that $d$-SAGE enables the efficient and accurate estimation of SAGE values.

MCML Authors

Gunnar König

Dr.

* Former Member

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

[477]

N. Pröllochs and S. Feuerriegel.
Mechanisms of True and False Rumor Sharing in Social Media: Collective Intelligence or Herd Behavior?
CHI 2023 - Conference on Human Factors in Computing Systems. Hamburg, Germany, Apr 23-28, 2023. DOI

Abstract

Social media platforms disseminate extensive volumes of online content, including true and, in particular, false rumors. Previous literature has studied the diffusion of offline rumors, yet more research is needed to understand the diffusion of online rumors. In this paper, we examine the role of lifetime and crowd effects in social media sharing behavior for true vs. false rumors. Based on 126,301 Twitter cascades, we find that the sharing behavior is characterized by lifetime and crowd effects that explain differences in the spread of true as opposed to false rumors. All else equal, we find that a longer lifetime is associated with less sharing activities, yet the reduction in sharing is larger for false than for true rumors. Hence, lifetime is an important determinant explaining why false rumors die out. Furthermore, we find that the spread of false rumors is characterized by herding tendencies (rather than collective intelligence), whereby the spread of false rumors becomes proliferated at a larger cascade depth. These findings explain differences in the diffusion dynamics of true and false rumors and further offer practical implications for social media platforms.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[476]

M. Rusu and S. Mayer.
Deep Learning Super-Resolution Network Facilitating Fiducial Tangibles on Capacitive Touchscreens.
CHI 2023 - Conference on Human Factors in Computing Systems. Hamburg, Germany, Apr 23-28, 2023. DOI

Abstract

Over the last few years, we have seen many approaches using tangibles to address the limited expressiveness of touchscreens. Mainstream tangible detection uses fiducial markers embedded in the tangibles. However, the coarse sensor size of capacitive touchscreens makes tangibles bulky, limiting their usefulness. We propose a novel deep-learning super-resolution network to facilitate fiducial tangibles on capacitive touchscreens better. In detail, our network super-resolves the markers enabling off-the-shelf detection algorithms to track tangibles reliably. Our network generalizes to unseen marker sets, such as AprilTag, ArUco, and ARToolKit. Therefore, we are not limited to a fixed number of distinguishable objects and do not require data collection and network training for new fiducial markers. With extensive evaluation, including real-world users and five showcases, we demonstrate the applicability of our open-source approach on commodity mobile devices and further highlight the potential of tangibles on capacitive touchscreens.

MCML Authors

Sven Mayer

Prof. Dr.

* Former Member

[475]

M. Windl, A. Schmidt and S. S. Feger.
Investigating Tangible Privacy-Preserving Mechanisms for Future Smart Homes.
CHI 2023 - Conference on Human Factors in Computing Systems. Hamburg, Germany, Apr 23-28, 2023. DOI

Abstract

Most smart home devices have multiple sensors, such as cameras and microphones; however, most cannot be controlled individually. Tangible privacy mechanisms provide control over individual sensors and instill high certainty of privacy. Yet, it remains unclear how they can be used in future smart homes. We conducted three studies to understand how tangible privacy mechanisms scale across multiple devices and respond to user needs. First, we conducted a focus group (N=8) on speculative tangible control artifacts to understand the user perspective. Second, we ran a workshop at a human-computer interaction conference (N=8) on tangible privacy. Third, we conducted a six-week in-the-wild study with a tangible, static privacy dashboard across six households. Our findings help to contrast the need for tangible privacy mechanisms on the sensor level with user needs on a smart home level. Finally, we discuss our design implications for future smart homes through the lens of inclusive privacy.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[474]

M. Windl, V. Winterhalter, A. Schmidt and S. Mayer.
Understanding and Mitigating Technology-Facilitated Privacy Violations in the Physical World.
CHI 2023 - Conference on Human Factors in Computing Systems. Hamburg, Germany, Apr 23-28, 2023. DOI

Abstract

We are constantly surrounded by technology that collects and processes sensitive data, paving the way for privacy violations. Yet, current research investigating technology-facilitated privacy violations in the physical world is scattered and focused on specific scenarios or investigates such violations purely from an expert’s perspective. Informed through a large-scale online survey, we first construct a scenario taxonomy based on user-experienced privacy violations in the physical world through technology. We then validate our taxonomy and establish mitigation strategies using interviews and co-design sessions with privacy and security experts. In summary, this work contributes (1) a refined scenario taxonomy for technology-facilitated privacy violations in the physical world, (2) an understanding of how privacy violations manifest in the physical world, (3) a decision tree on how to inform users, and (4) a design space to create notices whenever adequate. With this, we contribute a conceptual framework to enable a privacy-preserving technology-connected world.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[473]

M. Feurer, K. Eggensperger, E. Bergman, F. Pfisterer, B. Bischl and F. Hutter.
Mind the Gap: Measuring Generalization Performance Across Multiple Objectives.
IDA 2023 - 21st International Symposium on Intelligent Data Analysis. Louvain-la-Neuve, Belgium, Apr 12-14, 2023. DOI

Abstract

Modern machine learning models are often constructed taking into account multiple objectives, e.g., minimizing inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models, and the approximation of the Pareto front is used to assess their performance. In practice, we also want to measure generalization when moving from the validation to the test set. However, some of the models might no longer be Pareto-optimal which makes it unclear how to quantify the performance of the MHPO method when evaluated on the test set. To resolve this, we provide a novel evaluation protocol that allows measuring the generalization performance of MHPO methods and studying its capabilities for comparing two optimization experiments.

MCML Authors

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[472]

D. Schubert, P. Gupta and M. Wever.
Meta-learning for Automated Selection of Anomaly Detectors for Semi-supervised Datasets.
IDA 2023 - 21st International Symposium on Intelligent Data Analysis. Louvain-la-Neuve, Belgium, Apr 12-14, 2023. DOI

Abstract

In anomaly detection, a prominent task is to induce a model to identify anomalies learned solely based on normal data. Generally, one is interested in finding an anomaly detector that correctly identifies anomalies, i.e., data points that do not belong to the normal class, without raising too many false alarms. Which anomaly detector is best suited depends on the dataset at hand and thus needs to be tailored. The quality of an anomaly detector may be assessed via confusion-based metrics such as the Matthews correlation coefficient (MCC). However, since during training only normal data is available in a semi-supervised setting, such metrics are not accessible. To facilitate automated machine learning for anomaly detectors, we propose to employ meta-learning to predict MCC scores using the metrics that can be computed with normal data only and order anomaly detectors using the predicted scores for selection. First promising results can be obtained considering the hypervolume and the false positive rate as meta-features.

MCML Authors

Marcel Wever

Dr.

* Former Member

[471]

A. Karollus, J. Hingerl, D. Gankin, M. Grosshauser, K. Klemon and J. Gagneur.
Species-aware DNA language models capture regulatory elements and their evolution.
Genome Biology 35.83 (Apr. 2023). DOI

Abstract

Background: The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution.
Results: Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery.
Conclusions: Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.

MCML Authors

Alexander Karollus

Computational Molecular Medicine

Johannes Hingerl

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Molecular Medicine

[470]

D. Schalk, B. Bischl and D. Rügamer.
Accelerated Componentwise Gradient Boosting Using Efficient Data Representation and Momentum-Based Optimization.
Journal of Computational and Graphical Statistics 32.2 (Apr. 2023). DOI

Abstract

Componentwise boosting (CWB), also known as model-based boosting, is a variant of gradient boosting that builds on additive models as base learners to ensure interpretability. CWB is thus often used in research areas where models are employed as tools to explain relationships in data. One downside of CWB is its computational complexity in terms of memory and runtime. In this article, we propose two techniques to overcome these issues without losing the properties of CWB: feature discretization of numerical features and incorporating Nesterov momentum into functional gradient descent. As the latter can be prone to early overfitting, we also propose a hybrid approach that prevents a possibly diverging gradient descent routine while ensuring faster convergence. Our adaptions improve vanilla CWB by reducing memory consumption and speeding up the computation time per iteration (through feature discretization) while also enabling CWB learn faster and hence to require fewer iterations in total using momentum. We perform extensive benchmarks on multiple simulated and real-world datasets to demonstrate the improvements in runtime and memory consumption while maintaining state-of-the-art estimation and prediction performance.

MCML Authors

Daniel Schalk

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[469]

T. Tornede, A. Tornede, L. Fehring, L. Gehring, H. Graf, J. Hanselle, F. Mohr and M. Wever.
PyExperimenter: Easily distribute experiments and track results.
The Journal of Open Source Software 8.86 (Apr. 2023). DOI

Abstract

PyExperimenter is a tool to facilitate the setup, documentation, execution, and subsequent evaluation of results from an empirical study of algorithms and in particular is designed to reduce the involved manual effort significantly. It is intended to be used by researchers in the field of artificial intelligence, but is not limited to those.
The empirical analysis of algorithms is often accompanied by the execution of algorithms for different inputs and variants of the algorithms, specified via parameters, and the measurement of non-functional properties. Since the individual evaluations are usually independent, the evaluation can be performed in a distributed manner on an HPC system. However, setting up, documenting, and evaluating the results of such a study is often file-based. Usually, this requires extensive manual work to create configuration files for the inputs or to read and aggregate measured results from a report file. In addition, monitoring and restarting individual executions is tedious and time-consuming.
PyExperimenter adresses theses challenges by means of a single well defined configuration file and a central database for managing massively parallel evaluations, as well as collecting and aggregating their results. Thereby, PyExperimenter alleviates the aforementioned overhead and allows experiment executions to be defined and monitored with ease.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Marcel Wever

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

* Former Member

[468]

M. Herrmann, F. Pfisterer and F. Scheipl.
A geometric framework for outlier detection in high-dimensional data.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery e1491 (Apr. 2023). DOI

Abstract

Outlier or anomaly detection is an important task in data analysis. We discuss the problem from a geometrical perspective and provide a framework which exploits the metric structure of a data set. Our approach rests on the manifold assumption, that is, that the observed, nominally high-dimensional data lie on a much lower dimensional manifold and that this intrinsic structure can be inferred with manifold learning methods. We show that exploiting this structure significantly improves the detection of outlying observations in high dimensional data. We also suggest a novel, mathematically precise and widely applicable distinction between distributional and structural outliers based on the geometry and topology of the data manifold that clarifies conceptual ambiguities prevalent throughout the literature. Our experiments focus on functional data as one class of structured high-dimensional data, but the framework we propose is completely general and we include image and graph data applications. Our results show that the outlier structure of high-dimensional and non-tabular data can be detected and visualized using manifold learning methods and quantified using standard outlier scoring methods applied to the manifold embedding vectors.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Fabian Scheipl

PD Dr.

Functional Data Analysis

[467]

M. K. Belaid, D. E. Mekki, M. Rabus and E. Hüllermeier.
Optimizing Data Shapley Interaction Calculation from $O(2^n)$ to $O(t n^2)$ for KNN models.
Preprint (Apr. 2023). arXiv

Abstract

With the rapid growth of data availability and usage, quantifying the added value of each training data point has become a crucial process in the field of artificial intelligence. The Shapley values have been recognized as an effective method for data valuation, enabling efficient training set summarization, acquisition, and outlier removal. In this paper, we introduce ‘STI-KNN’, an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in $O(t n^2)$ time, which is a significant improvement over the $O(2^n)$ time complexity of baseline methods. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[466]

S. Dandl, A. Hofheinz, M. Binder, B. Bischl and G. Casalicchio.
counterfactuals: An R Package for Counterfactual Explanation Methods.
Preprint (Apr. 2023). arXiv

Abstract

Counterfactual explanation methods provide information on how feature values of individual observations must be changed to obtain a desired prediction. Despite the increasing amount of proposed methods in research, only a few implementations exist whose interfaces and requirements vary widely. In this work, we introduce the counterfactuals R package, which provides a modular and unified R6-based interface for counterfactual explanation methods. We implemented three existing counterfactual explanation methods and propose some optional methodological extensions to generalize these methods to different scenarios and to make them more comparable. We explain the structure and workflow of the package using real use cases and show how to integrate additional counterfactual explanation methods into the package. In addition, we compared the implemented methods for a variety of models and datasets with regard to the quality of their counterfactual explanations and their runtime behavior.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[465]

J. Maly and R. Saab.
A simple approach for quantizing neural networks.
Preprint (Apr. 2023). arXiv

Abstract

In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving the network performance on given training data. On one hand, the computational complexity of this pre-processing slightly exceeds that of state-of-the-art algorithms in the literature. On the other hand, our approach does not require any hyper-parameter tuning and, in contrast to previous methods, allows a plain analysis. We provide rigorous theoretical guarantees in the case of quantizing single network layers and show that the relative error decays with the number of parameters in the network if the training data behaves well, e.g., if it is sampled from suitable random distributions. The developed method also readily allows the quantization of deep networks by consecutive application to single layers.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Data Science and Artificial Intelligence

[464]

L. Sun.
Well-posedness and L1−Lp Smoothing Effect of the Porous Media Equation under Poincaré Inequality.
Preprint (Apr. 2023). arXiv

Abstract

We investigate the well-posedness and uniqueness of the Cauchy problem for a class of porous media equations defined on ℝd, and demonstrate the L1−Lp smoothing effect. In particular, we establish that the logarithm of the ratio of the Lp norm to the L1 norm decreases super-exponentially fast during the initial phase, subsequently decaying to zero exponentially fast in the latter phase. This implies that if the initial data is solely in L1, then for t>0, the solution will belong to Lp for any p∈[1,∞). The results are obtained under the assumption of a Poincaré inequality.

MCML Authors

Lukang Sun

Applied Numerical Analysis

[463]

T. Wimmer, V. Golkov, H. Dang, M. Zaiss, A. Maier and D. Cremers.
Scale-Equivariant Deep Learning for 3D Data.
Preprint (Apr. 2023). arXiv GitHub

Abstract

The ability of convolutional neural networks (CNNs) to recognize objects regardless of their position in the image is due to the translation-equivariance of the convolutional operation. Group-equivariant CNNs transfer this equivariance to other transformations of the input. Dealing appropriately with objects and object parts of different scale is challenging, and scale can vary for multiple reasons such as the underlying object size or the resolution of the imaging modality. In this paper, we propose a scale-equivariant convolutional network layer for three-dimensional data that guarantees scale-equivariance in 3D CNNs. Scale-equivariance lifts the burden of having to learn each possible scale separately, allowing the neural network to focus on higher-level learning goals, which leads to better results and better data-efficiency. We provide an overview of the theoretical foundations and scientific work on scale-equivariant neural networks in the two-dimensional domain. We then transfer the concepts from 2D to the three-dimensional space and create a scale-equivariant convolutional layer for 3D data. Using the proposed scale-equivariant layer, we create a scale-equivariant U-Net for medical image segmentation and compare it with a non-scale-equivariant baseline method. Our experiments demonstrate the effectiveness of the proposed method in achieving scale-equivariance for 3D medical image analysis.

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[462]

Y. Yeganeh, A. Farshad, P. Weinberger, S.-A. Ahmadi, E. Adeli and N. Navab.
DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation.
Preprint (Apr. 2023). arXiv

Abstract

Although purely transformer-based architectures showed promising performance in many computer vision tasks, many hybrid models consisting of CNN and transformer blocks are introduced to fit more specialized tasks. Nevertheless, despite the performance gain of both pure and hybrid transformer-based architectures compared to CNNs in medical imaging segmentation, their high training cost and complexity make it challenging to use them in real scenarios. In this work, we propose simple architectures based on purely convolutional layers, and show that by just taking advantage of the attention map visualizations obtained from a self-supervised pretrained vision transformer network (e.g., DINO) one can outperform complex transformer-based networks with much less computation costs. The proposed architecture is composed of two encoder branches with the original image as input in one branch and the attention map visualizations of the same image from multiple self-attention heads from a pre-trained DINO model (as multiple channels) in the other branch. The results of our experiments on two publicly available medical imaging datasets show that the proposed pipeline outperforms U-Net and the state-of-the-art medical image segmentation models.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[461]

E. Hüllermeier.
Representation of Quantification of Uncertainty in Machine Learning.
TRR 165/181 2023 - Scale interactions, data-driven modeling, and uncertainty in weather and climate. Ingolstadt, Germany, Mar 27-30, 2023. Invited Talk. PDF

Abstract

n/a

MCML Authors

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[460]

D. Schalk.
Modern approaches for component-wise boosting: Automation, efficiency, and distributed computing with application to the medical domain.
Dissertation 2023. DOI

Abstract

This thesis focuses on enhancing component-wise boosting (CWB) by improving its efficiency and usability, particularly in high-dimensional feature spaces and distributed data settings. Key contributions include the optimization of the CWB algorithm through Nesterov’s momentum for faster fitting and reduced memory usage, as well as the development of the Autocompboost framework to integrate CWB with AutoML, emphasizing model interpretability. Additionally, the thesis introduces methods for evaluating binary classification models on distributed data using ROC analysis, and presents several R packages (compboost, dsCWB, Autocompboost, dsBinVal) that implement these advances. (Shortened.)

MCML Authors

Daniel Schalk

Dr.

* Former Member

[459]

L. He, N. Otani, D. R. Mortensen, L. Levin and H. Schütze.
Construction Grammar Provides Unique Insight into Neural Language Models.
GURT 2023 - Georgetown University Round Table on Linguistics. Washington D.C., USA, Mar 09-12, 2023. URL

Abstract

Construction Grammar (CxG) has recently been used as the basis for probing studies that have investigated the performance of large pre-trained language models (PLMs) with respect to the structure and meaning of constructions. In this position paper, we make suggestions for the continuation and augmentation of this line of research. We look at probing methodology that was not designed with CxG in mind, as well as probing methodology that was designed for specific constructions. We analyse selected previous work in detail, and provide our view of the most important challenges and research questions that this promising new field faces.

MCML Authors

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[458]

J. Moosbauer, G. Casalicchio, M. Lindauer and B. Bischl.
Improving Accuracy of Interpretability Measures in Hyperparameter Optimization via Bayesian Algorithm Execution.
COSEAL 2023 - Workshop on Configuration and Selection of Algorithms. Paris, France, Mar 06-08, 2023. arXiv

Abstract

Despite all the benefits of automated hyperparameter optimization (HPO), most modern HPO algorithms are black-boxes themselves. This makes it difficult to understand the decision process which leads to the selected configuration, reduces trust in HPO, and thus hinders its broad adoption. Here, we study the combination of HPO with interpretable machine learning (IML) methods such as partial dependence plots. These techniques are more and more used to explain the marginal effect of hyperparameters on the black-box cost function or to quantify the importance of hyperparameters. However, if such methods are naively applied to the experimental data of the HPO process in a post-hoc manner, the underlying sampling bias of the optimizer can distort interpretations. We propose a modified HPO method which efficiently balances the search for the global optimum w.r.t. predictive performance and the reliable estimation of IML explanations of an underlying black-box function by coupling Bayesian optimization and Bayesian Algorithm Execution. On benchmark cases of both synthetic objectives and HPO of a neural network, we demonstrate that our method returns more reliable explanations of the underlying black-box without a loss of optimization performance.

MCML Authors

Julia Moosbauer

Dr.

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Learning and Data Science

[457]

T. Ullmann, A. Beer, M. Hünemörder, T. Seidl and A.-L. Boulesteix.
Over-optimistic evaluation and reporting of novel cluster algorithms: An illustrative study.
Advances in Data Analysis and Classification 17 (Mar. 2023). DOI

Abstract

When researchers publish new cluster algorithms, they usually demonstrate the strengths of their novel approaches by comparing the algorithms’ performance with existing competitors. However, such studies are likely to be optimistically biased towards the new algorithms, as the authors have a vested interest in presenting their method as favorably as possible in order to increase their chances of getting published. Therefore, the superior performance of newly introduced cluster algorithms is over-optimistic and might not be confirmed in independent benchmark studies performed by neutral and unbiased authors. This problem is known among many researchers, but so far, the different mechanisms leading to over-optimism in cluster algorithm evaluation have never been systematically studied and discussed. Researchers are thus often not aware of the full extent of the problem. We present an illustrative study to illuminate the mechanisms by which authors—consciously or unconsciously—paint their cluster algorithm’s performance in an over-optimistic light. Using the recently published cluster algorithm Rock as an example, we demonstrate how optimization of the used datasets or data characteristics, of the algorithm’s parameters and of the choice of the competing cluster algorithms leads to Rock’s performance appearing better than it actually is. Our study is thus a cautionary tale that illustrates how easy it can be for researchers to claim apparent ‘superiority’ of a new cluster algorithm. This illuminates the vital importance of strategies for avoiding the problems of over-optimism (such as, e.g., neutral benchmark studies), which we also discuss in the article.

MCML Authors

Theresa Ullmann

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[456]

Q. Khan, I. Sülö, M. Öcal and D. Cremers.
Learning vision based autonomous lateral vehicle control without supervision.
Applied Intelligence 53 (Mar. 2023). DOI GitHub

Abstract

Supervised deep learning methods using image data as input have shown promising results in the context of vehicle control. However, these supervised methods have two main disadvantages: 1) They require a copious amount of labeled training data, which is difficult and expensive to collect. 2) Such models do not perform well, when situations that are not in the distribution of the training set are encountered. This includes deviations from the designated driving behavior. We therefore provide a framework to mitigate these problems from merely an unlabeled sequence of images. Visual Odometry is first used to determine the vehicle trajectory. Model Predictive Control (MPC) then uses this trajectory to implicitly infer the steering labels. Meanwhile, synthesized images at deviated trajectories are included in the training distribution for enhanced robustness of the neural network model. Experimental results demonstrate that the performance of our network is at par with methods requiring additional data collection or supervision.

MCML Authors

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Computer Vision & Artificial Intelligence

[455]

P. Heid.
A short note on an adaptive damped Newton method for strongly monotone and Lipschitz continuous operator equations.
Archiv der Mathematik (Mar. 2023). URL

Abstract

We consider the damped Newton method for strongly monotone and Lipschitz continuous operator equations in a variational setting. We provide a very accessible justification why the undamped Newton method performs better than its damped counterparts in a vicinity of a solution. Moreover, in the given setting, an adaptive step-size strategy be presented, which guarantees the global convergence and favours an undamped update if admissible.

MCML Authors

Pascal Heid

Dr.

* Former Member

[454]

A. Scagliotti.
Optimal control of ensembles of dynamical systems.
ESAIM - Control, Optimisation and Calculus of Variations 29.22 (Mar. 2023). DOI

Abstract

In this paper we consider the problem of the optimal control of an ensemble of affine-control systems. After proving the well-posedness of the minimization problem under examination, we establish a $Gamma$-convergence result that allows us to substitute the original (and usually infinite) ensemble with a sequence of finite increasing-in-size sub-ensembles. The solutions of the optimal control problems involving these sub-ensembles provide approximations in the $L^2$-strong topology of the minimizers of the original problem. Using again a $Gamma$-convergence argument, we manage to derive a Maximum Principle for ensemble optimal control problems with end-point cost. Moreover, in the case of finite sub-ensembles, we can address the minimization of the related cost through numerical schemes. In particular, we propose an algorithm that consists of a subspace projection of the gradient field induced on the space of admissible controls by the approximating cost functional. In addition, we consider an iterative method based on the Pontryagin Maximum Principle. Finally, we test the algorithms on an ensemble of linear systems in mathbb{R^2}.

MCML Authors

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[453]

S. Klenk, L. Koestler, D. Scaramuzza and D. Cremers.
E-NeRF: Neural Radiance Fields from a Moving Event Camera.
IEEE Robotics and Automation Letters 8.3 (Mar. 2023). DOI

Abstract

Estimating neural radiance fields (NeRFs) from “ideal” images has been extensively studied in the computer vision community. Most approaches assume optimal illumination and slow camera motion. These assumptions are often violated in robotic applications, where images may contain motion blur, and the scene may not have suitable illumination. This can cause significant problems for downstream tasks such as navigation, inspection, or visualization of the scene. To alleviate these problems, we present E-NeRF, the first method which estimates a volumetric scene representation in the form of a NeRF from a fast-moving event camera. Our method can recover NeRFs during very fast motion and in high-dynamic-range conditions where frame-based approaches fail. We show that rendering high-quality frames is possible by only providing an event stream as input. Furthermore, by combining events and frames, we can estimate NeRFs of higher quality than state-of-the-art approaches under severe motion blur. We also show that combining events and frames can overcome failure cases of NeRF estimation in scenarios where only a few input views are available without requiring additional regularization.

MCML Authors

Simon Klenk

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[452]

T. Nagler and T. Vatter.
Solving Estimating Equations With Copulas.
Journal of the American Statistical Association 119.546 (Mar. 2023). DOI

Abstract

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[451]

D. S. Fischer, A. C. Schaar and F. J. Theis.
Modeling intercellular communication in tissues using spatial graphs of cell.
Nature Biotechnology 41 (Mar. 2023). DOI

Abstract

Models of intercellular communication in tissues are based on molecular profiles of dissociated cells, are limited to receptor–ligand signaling and ignore spatial proximity in situ. We present node-centric expression modeling, a method based on graph neural networks that estimates the effects of niche composition on gene expression in an unbiased manner from spatial molecular profiling data. We recover signatures of molecular processes known to underlie cell communication.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[450]

L. Heumos, A. C. Schaar, C. Lance, A. Litinetskaya, F. Drost, L. Zappia, M. D. Lücken, D. C. Strobl, J. Henao, F. Curion, S.-c. Best Practices Consortium, H. B. Schiller and F. J. Theis.
Best practices for single-cell analysis across modalities.
Nature Reviews Genetics 24 (Mar. 2023). DOI

Abstract

Recent advances in single-cell technologies have enabled high-throughput molecular profiling of cells across modalities and locations. Single-cell transcriptomics data can now be complemented by chromatin accessibility, surface protein expression, adaptive immune receptor repertoire profiling and spatial information. The increasing availability of single-cell data across modalities has motivated the development of novel computational methods to help analysts derive biological insights. As the field grows, it becomes increasingly difficult to navigate the vast landscape of tools and analysis steps. Here, we summarize independent benchmarking studies of unimodal and multimodal single-cell analysis across modalities to suggest comprehensive best-practice workflows for the most common analysis steps. Where independent benchmarks are not available, we review and contrast popular methods. Our article serves as an entry point for novices in the field of single-cell (multi-)omic analysis and guides advanced users to the most recent best practices.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[449]

L. Haliburton, S. Y. Schött, L. Hirsch, R. Welsch and A. Schmidt.
Feeling the Temperature of the Room: Unobtrusive Thermal Display of Engagement during Group Communication.
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7.2 (Mar. 2023). DOI

Abstract

Thermal signals have been explored in HCI for emotion-elicitation and enhancing two-person communication, showing that temperature invokes social and emotional signals in individuals. Yet, extending these findings to group communication is missing. We investigated how thermal signals can be used to communicate group affective states in a hybrid meeting scenario to help people feel connected over a distance. We conducted a lab study (N=20 participants) and explored wrist-worn thermal feedback to communicate audience emotions. Our results show that thermal feedback is an effective method of conveying audience engagement without increasing workload and can help a presenter feel more in tune with the audience. We outline design implications for real-world wearable social thermal feedback systems for both virtual and in-person communication that support group affect communication and social connectedness. Thermal feedback has the potential to connect people across distances and facilitate more effective and dynamic communication in multiple contexts.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[448]

B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A.-L. Boulesteix, D. Deng and M. Lindauer.
Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13.2 (Mar. 2023). DOI

Abstract

Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time-consuming and irreproducible manual process of trial-and-error to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

Statistical Learning and Data Science

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Tobias Pielok

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Stefan Coors

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Theresa Ullmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[447]

J. Brandt, E. Schede, B. Haddenhorst, V. Bengs, E. Hüllermeier and K. Tierney.
AC-Band: A Combinatorial Bandit-Based Approach to Algorithm Configuration.
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI

Abstract

We study the algorithm configuration (AC) problem, in which one seeks to find an optimal parameter configuration of a given target algorithm in an automated way. Although this field of research has experienced much progress recently regarding approaches satisfying strong theoretical guarantees, there is still a gap between the practical performance of these approaches and the heuristic state-of-the-art approaches. Recently, there has been significant progress in designing AC approaches that satisfy strong theoretical guarantees. However, a significant gap still remains between the practical performance of these approaches and state-of-the-art heuristic methods. To this end, we introduce AC-Band, a general approach for the AC problem based on multi-armed bandits that provides theoretical guarantees while exhibiting strong practical performance. We show that AC-Band requires significantly less computation time than other AC approaches providing theoretical guarantees while still yielding high-quality configurations.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence and Machine Learning

[446]

D. Frauen, T. Hatt, V. Melnychuk and S. Feuerriegel.
Estimating Average Causal Effects from Patient Trajectories.
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI

Abstract

In medical practice, treatments are selected based on the expected causal effects on patient outcomes. Here, the gold standard for estimating causal effects are randomized controlled trials; however, such trials are costly and sometimes even unethical. Instead, medical practice is increasingly interested in estimating causal effects among patient (sub)groups from electronic health records, that is, observational data. In this paper, we aim at estimating the average causal effect (ACE) from observational data (patient trajectories) that are collected over time. For this, we propose DeepACE: an end-to-end deep learning model. DeepACE leverages the iterative G-computation formula to adjust for the bias induced by time-varying confounders. Moreover, we develop a novel sequential targeting procedure which ensures that DeepACE has favorable theoretical properties, i. e., is doubly robust and asymptotically efficient. To the best of our knowledge, this is the first work that proposes an end-to-end deep learning model tailored for estimating time-varying ACEs. We compare DeepACE in an extensive number of experiments, confirming that it achieves state-of-the-art performance. We further provide a case study for patients suffering from low back pain to demonstrate that DeepACE generates important and meaningful findings for clinical practice. Our work enables practitioners to develop effective treatment recommendations based on population effects.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[445]

R. Koner, T. Hannan, S. Shit, S. Sharifzadeh, M. Schubert, T. Seidl and V. Tresp.
InstanceFormer: An Online Video Instance Segmentation Framework.
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI GitHub

Abstract

Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage transformer-based efficient online VIS framework named InstanceFormer, which is especially suitable for long and challenging videos. We propose three novel components to model short-term and long-term dependency and temporal coherence. First, we propagate the representation, location, and semantic information of prior instances to model short-term changes. Second, we propose a novel memory cross-attention in the decoder, which allows the network to look into earlier instances within a certain temporal window. Finally, we employ a temporal contrastive loss to impose coherence in the representation of an instance across all frames. Memory attention and temporal coherence are particularly beneficial to long-range dependency modeling, including challenging scenarios like occlusion. The proposed InstanceFormer outperforms previous online benchmark methods by a large margin across multiple datasets. Most importantly, InstanceFormer surpasses offline approaches for challenging and long datasets such as YouTube-VIS-2021 and OVIS.

MCML Authors

Rajat Koner

Database Systems and Data Mining

Tanveer Hannan

Database Systems and Data Mining

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[444]

G. König, T. Freiesleben and M. Grosse-Wentrup.
Improvement-focused causal recourse (ICR).
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI

Abstract

Algorithmic recourse recommendations, such as Karimi et al.’s (2021) causal recourse (CR), inform stakeholders of how to act to revert unfavorable decisions. However, there are actions that lead to acceptance (i.e., revert the model’s decision) but do not lead to improvement (i.e., may not revert the underlying real-world state). To recommend such actions is to recommend fooling the predictor. We introduce a novel method, Improvement-Focused Causal Recourse (ICR), which involves a conceptual shift: Firstly, we require ICR recommendations to guide toward improvement. Secondly, we do not tailor the recommendations to be accepted by a specific predictor. Instead, we leverage causal knowledge to design decision systems that predict accurately pre- and post-recourse. As a result, improvement guarantees translate into acceptance guarantees. We demonstrate that given correct causal knowledge ICR, in contrast to existing approaches, guides toward both acceptance and improvement.

MCML Authors

Gunnar König

Dr.

* Former Member

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

[443]

D. Rügamer, C. Kolb and N. Klein.
Semi-Structured Distributional Regression.
American Statistician (Feb. 2023). DOI

Abstract

Combining additive models and neural networks allows to broaden the scope of statistical regression and extends deep learning-based approaches by interpretable structured additive predictors at the same time. Existing approaches uniting the two modeling approaches are, however, limited to very specific combinations and, more importantly, involve an identifiability issue. As a consequence, interpretability and stable estimation is typically lost. We propose a general framework to combine structured regression models and deep neural networks into a unifying network architecture. To overcome the inherent identifiability issues between different model parts, we construct an orthogonalization cell that projects the deep neural network into the orthogonal complement of the statistical model predictor. This enables proper estimation of structured model parts and thereby interpretability. We demonstrate the framework’s efficacy in numerical experiments and illustrate its special merits in benchmarks and real-world applications.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Chris Kolb

Statistical Learning and Data Science

[442]

S. Schallmoser, T. Zueger, M. Kraus, M. Saar-Tsechansky, C. Stettler and S. Feuerriegel.
Machine Learning for Predicting Micro- and Macrovascular Complications in Individuals With Prediabetes or Diabetes: Retrospective Cohort Study.
Journal of Medical Internet Research 25 (Feb. 2023). DOI

Abstract

Background: Micro- and macrovascular complications are a major burden for individuals with diabetes and can already arise in a prediabetic state. To allocate effective treatments and to possibly prevent these complications, identification of those at risk is essential.
Objective: This study aimed to build machine learning (ML) models that predict the risk of developing a micro- or macrovascular complication in individuals with prediabetes or diabetes.
Methods: In this study, we used electronic health records from Israel that contain information about demographics, biomarkers, medications, and disease codes; span from 2003 to 2013; and were queried to identify individuals with prediabetes or diabetes in 2008. Subsequently, we aimed to predict which of these individuals developed a micro- or macrovascular complication within the next 5 years. We included 3 microvascular complications: retinopathy, nephropathy, and neuropathy. In addition, we considered 3 macrovascular complications: peripheral vascular disease (PVD), cerebrovascular disease (CeVD), and cardiovascular disease (CVD). Complications were identified via disease codes, and, for nephropathy, the estimated glomerular filtration rate and albuminuria were considered additionally. Inclusion criteria were complete information on age and sex and on disease codes (or measurements of estimated glomerular filtration rate and albuminuria for nephropathy) until 2013 to account for patient dropout. Exclusion criteria for predicting a complication were diagnosis of this specific complication before or in 2008. In total, 105 predictors from demographics, biomarkers, medications, and disease codes were used to build the ML models. We compared 2 ML models: logistic regression and gradient-boosted decision trees (GBDTs). To explain the predictions of the GBDTs, we calculated Shapley additive explanations values.
Results: Overall, 13,904 and 4259 individuals with prediabetes and diabetes, respectively, were identified in our underlying data set. For individuals with prediabetes, the areas under the receiver operating characteristic curve for logistic regression and GBDTs were, respectively, 0.657 and 0.681 (retinopathy), 0.807 and 0.815 (nephropathy), 0.727 and 0.706 (neuropathy), 0.730 and 0.727 (PVD), 0.687 and 0.693 (CeVD), and 0.707 and 0.705 (CVD); for individuals with diabetes, the areas under the receiver operating characteristic curve were, respectively, 0.673 and 0.726 (retinopathy), 0.763 and 0.775 (nephropathy), 0.745 and 0.771 (neuropathy), 0.698 and 0.715 (PVD), 0.651 and 0.646 (CeVD), and 0.686 and 0.680 (CVD). Overall, the prediction performance is comparable for logistic regression and GBDTs. The Shapley additive explanations values showed that increased levels of blood glucose, glycated hemoglobin, and serum creatinine are risk factors for microvascular complications. Age and hypertension were associated with an elevated risk for macrovascular complications.
Conclusions: Our ML models allow for an identification of individuals with prediabetes or diabetes who are at increased risk of developing micro- or macrovascular complications. The prediction performance varied across complications and target populations but was in an acceptable range for most prediction tasks.

MCML Authors

Simon Schallmoser

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[441]

D. Rügamer, P. Baumann, T. Kneib and T. Hothorn.
Probabilistic Time Series Forecasts with Autoregressive Transformation Models.
Statistics and Computing 33.2 (Feb. 2023). DOI

Abstract

Probabilistic forecasting of time series is an important matter in many applications and research fields. In order to draw conclusions from a probabilistic forecast, we must ensure that the model class used to approximate the true forecasting distribution is expressive enough. Yet, characteristics of the model itself, such as its uncertainty or its feature-outcome relationship are not of lesser importance. This paper proposes Autoregressive Transformation Models (ATMs), a model class inspired by various research directions to unite expressive distributional forecasts using a semi-parametric distribution assumption with an interpretable model specification. We demonstrate the properties of ATMs both theoretically and through empirical evaluation on several simulated and real-world forecasting datasets.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[440]

D. Schalk, V. S. Hoffmann, B. Bischl and U. Mansmann.
dsBinVal: Conducting distributed ROC analysis using DataSHIELD.
The Journal of Open Source Software 8.82 (Feb. 2023). DOI

Abstract

Our R (R Core Team, 2021) package dsBinVal implements the methodology explained by Schalk et al. (2022). It extends the ROC-GLM (Pepe, 2000) to distributed data by using techniques of differential privacy (Dwork et al., 2006) and the idea of sharing highly aggregated values only. The package also exports functionality to calculate distributed calibration curves and assess the calibration. Using the package allows us to evaluate a prognostic model based on a binary outcome using the DataSHIELD (Gaye et al., 2014) framework. Therefore, the main functionality makes it able to 1) compute the receiver operating characteristic (ROC) curve using the ROC-GLM from which 2) the area under the curve (AUC) and confidence intervals (CI) are derived to conduct hypothesis testing according to DeLong et al. (1988). Furthermore, 3) the calibration can be assessed distributively via calibration curves and the Brier score. Visualizing the approximated ROC curve, the AUC with confidence intervals, and the calibration curves using ggplot2 is also supported. Examples can be found in the README file of the repository.

MCML Authors

Daniel Schalk

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[439]

J. Brandt, M. Wever, D. Iliadis, V. Bengs and E. Hüllermeier.
Iterative Deepening Hyperband.
Preprint (Feb. 2023). arXiv

Abstract

Hyperparameter optimization (HPO) is concerned with the automated search for the most appropriate hyperparameter configuration (HPC) of a parameterized machine learning algorithm. A state-of-the-art HPO method is Hyperband, which, however, has its own parameters that influence its performance. One of these parameters, the maximal budget, is especially problematic: If chosen too small, the budget needs to be increased in hindsight and, as Hyperband is not incremental by design, the entire algorithm must be re-run. This is not only costly but also comes with a loss of valuable knowledge already accumulated. In this paper, we propose incremental variants of Hyperband that eliminate these drawbacks, and show that these variants satisfy theoretical guarantees qualitatively similar to those for the original Hyperband with the ‘right’ budget. Moreover, we demonstrate their practical utility in experiments with benchmark data sets.

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[438]

D. Rügamer.
mixdistreg: An R Package for Fitting Mixture of Experts Distributional Regression with Adaptive First-order Methods.
Preprint (Feb. 2023). arXiv

Abstract

This paper presents a high-level description of the R software package mixdistreg to fit mixture of experts distributional regression models. The proposed framework is implemented in R using the deepregression software template, which is based on TensorFlow and follows the neural structured additive learning principle. The software comprises various approaches as special cases, including mixture density networks and mixture regression approaches. Various code examples are given to demonstrate the package’s functionality.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[437]

L. Sang, B. Häfner, X. Zuo and D. Cremers.
High-Quality RGB-D Reconstruction via Multi-View Uncalibrated Photometric Stereo and Gradient-SDF.
WACV 2023 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 03-07, 2023. DOI

Abstract

Fine-detailed reconstructions are in high demand in many applications. However, most of the existing RGB-D reconstruction methods rely on pre-calculated accurate camera poses to recover the detailed surface geometry, where the representation of a surface needs to be adapted when optimizing different quantities. In this paper, we present a novel multi-view RGB-D based reconstruction method that tackles camera pose, lighting, albedo, and surface normal estimation via the utilization of a gradient signed distance field (gradient-SDF). The proposed method formulates the image rendering process using specific physically-based model(s) and optimizes the surface’s quantities on the actual surface using its volumetric representation, as opposed to other works which estimate surface quantities only near the actual surface. To validate our method, we investigate two physically-based image formation models for natural light and point light source applications. The experimental results on synthetic and real-world datasets demonstrate that the proposed method can recover high-quality geometry of the surface more faithfully than the state-of-the-art and further improves the accuracy of estimated camera poses

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Björn Häfner

B3 | Multimodal Perception
→ Group Stefan Leutenegger

* Former Member

Xingxing Zuo

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[436]

F. Ott.
Representation learning for domain adaptation and cross-modal retrieval: in the context of online handwriting recognition and visual self-localization.
Dissertation 2023. DOI

Abstract

This thesis focuses on domain adaptation and cross-modal retrieval to address the challenges posed by domain shifts in machine learning applications. Specifically, it explores techniques for online handwriting recognition and visual self-localization. For handwriting recognition, the study uses deep metric learning and optimal transport to reduce domain shifts between different writing styles and writing modalities, while for visual self-localization, it enhances pose prediction through auxiliary tasks and representation learning fusion techniques to improve accuracy across sensor modalities. (Shortened.)

MCML Authors

Felix Ott

Dr.

* Former Member

[435]

V. Bengs and E. Hüllermeier.
Multi-armed bandits with censored consumption of resources.
Machine Learning 112.1 (Jan. 2023). DOI

Abstract

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit. Otherwise, the observation is censored, i.e., no reward is obtained. For this problem setting, we introduce a measure of regret, which incorporates both the actual amount of consumed resources of each learning round and the optimality of realizable rewards as well as the risk of exceeding the allocated resource limit. Thus, to minimize regret, the learner needs to set a resource limit and choose an arm in such a way that the chance to realize a high reward within the predefined resource limit is high, while the resource limit itself should be kept as low as possible. We propose a UCB-inspired online learning algorithm, which we analyze theoretically in terms of its regret upper bound. In a simulation study, we show that our learning algorithm outperforms straightforward extensions of standard multi-armed bandit algorithms.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Artificial Intelligence and Machine Learning

[434]

T. Ullmann, S. Peschel, P. Finger, C. L. Müller and A.-L. Boulesteix.
Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering.
PLOS Computational Biology 19.1 (Jan. 2023). DOI

Abstract

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-optimism: the “best” method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the “best” method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.

MCML Authors

Theresa Ullmann

Dr.

* Former Member

Stefanie Peschel

C2 | Biology
→ Group Christian Müller

Biomedical Statistics and Data Science

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[433]

P. T. da Silva, Y. Zhang, E. Theodorakis, L. D. Martens, V. A. Yépez, V. Pelechano and J. Gagneur.
Cellular energy regulates mRNA translation and degradation in a codon-specific manner.
Preprint (2023). DOI

Abstract

Background: Codon optimality is a major determinant of mRNA translation and degradation rates. However, whether and through which mechanisms its effects are regulated remains poorly understood.
Results: Here we show that codon optimality associates with up to 2-fold change in mRNA stability variations between human tissues, and that its effect is attenuated in tissues with high energy metabolism and amplifies with age. Biochemical modeling and perturbation data through oxygen deprivation and ATP synthesis inhibition reveal that cellular energy variations non-uniformly affect the decoding kinetics of different codons.
Conclusions: This new mechanism of codon effect regulation, independent of tRNA regulation, provides a fundamental mechanistic link between cellular energy metabolism and eukaryotic gene expression.

MCML Authors

Pedro Tomaz da Silva

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Computational Molecular Medicine

[432]

I. Ziegler, B. Ma, B. Bischl, E. Dorigatti and B. Schubert.
Proteasomal cleavage prediction: state-of-the-art and future directions.
Preprint (2023). DOI GitHub

Abstract

Epitope vaccines are a promising approach for precision treatment of pathogens, cancer, autoimmune diseases, and allergies. Effectively designing such vaccines requires accurate proteasomal cleavage prediction to ensure that the epitopes included in the vaccine trigger an immune response. The performance of proteasomal cleavage predictors has been steadily improving over the past decades owing to increasing data availability and methodological advances. In this review, we summarize the current proteasomal cleavage prediction landscape and, in light of recent progress in the field of deep learning, develop and compare a wide range of recent architectures and techniques, including long short-term memory (LSTM), transformers, and convolutional neural networks (CNN), as well as four different denoising techniques. All open-source cleavage predictors re-trained on our dataset performed within two AUC percentage points. Our comprehensive deep learning architecture benchmark improved performance by 1.7 AUC percentage points, while closed-source predictors performed considerably worse. We found that a wide range of architectures and training regimes all result in very similar performance, suggesting that the specific modeling approach employed has a limited impact on predictive performance compared to the specifics of the dataset employed. We speculate that the noise and implicit nature of data acquisition techniques used for training proteasomal cleavage prediction models and the complexity of biological processes of the antigen processing pathway are the major limiting factors. While biological complexity can be tackled by more data and, to a lesser extent, better models, noise and randomness inherently limit the maximum achievable predictive performance.

MCML Authors

Bolei Ma

Social Data Science and AI

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Emilio Dorigatti

Dr.

* Former Member

[431]

P. Gupta, J. P. Drees and E. Hüllermeier.
Automated Side-Channel Attacks using Black-Box Neural Architecture Search.
Preprint at Cryptology ePrint Archive (Jan. 2023). URL

Abstract

The usage of convolutional neural networks (CNNs) to break cryptographic systems through hardware side-channels has enabled fast and adaptable attacks on devices like smart cards and TPMs. Current literature proposes fixed CNN architectures designed by domain experts to break such systems, which is time-consuming and unsuitable for attacking a new system. Recently, an approach using neural architecture search (NAS), which is able to acquire a suitable architecture automatically, has been explored. These works use the secret key information in the attack dataset for optimization and only explore two different search strategies using one-dimensional CNNs. We propose a NAS approach that relies only on using the profiling dataset for optimization, making it fully black-box. Using a large-scale experimental parameter study, we explore which choices for NAS, such as 1-D or 2-D CNNs and search strategy, produce the best results on 10 state-of-the-art datasets for Hamming weight and identity leakage models. We show that applying the random search strategy on 1-D inputs results in a high success rate and retrieves the correct secret key using a single attack trace on two of the datasets. This combination matches the attack efficiency of fixed CNN architectures, outperforming them in 4 out of 10 datasets. Our experiments also point toward the need for repeated attack evaluations of machine learning-based solutions in order to avoid biased performance estimates.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Artificial Intelligence and Machine Learning

2022

[430]

O. Shchur.
Modeling Continuous-time Event Data with Neural Temporal Point Processes.
Dissertation 2022. URL

Abstract

Temporal point processes (TPPs) provide a natural framework for modeling continuous-time event data such as earthquake catalogs in seismology or spike trains in neuroscience. Unlike conventional TPP models, neural TPPs are able to capture complex patterns present in real-world event data. The two main themes of this thesis are design of flexible, tractable and efficient neural TPP models, and their applications to real-world problems.

MCML Authors

Oleksandr Shchur

Dr.

* Former Member

[429]

J. Goschenhofer, P. Ragupathy, C. Heumann, B. Bischl and M. Aßenmacher.
CC-Top: Constrained Clustering for Dynamic Topic Discovery.
EvoNLP 2022 - 1st Workshop on Ever Evolving NLP. Abu Dhabi, United Arab Emirates, Dec 07, 2022. URL

Abstract

Research on multi-class text classification of short texts mainly focuses on supervised (transfer) learning approaches, requiring a finite set of pre-defined classes which is constant over time. This work explores deep constrained clustering (CC) as an alternative to supervised learning approaches in a setting with a dynamically changing number of classes, a task we introduce as dynamic topic discovery (DTD).We do so by using pairwise similarity constraints instead of instance-level class labels which allow for a flexible number of classes while exhibiting a competitive performance compared to supervised approaches. First, we substantiate this through a series of experiments and show that CC algorithms exhibit a predictive performance similar to state-of-the-art supervised learning algorithms while requiring less annotation effort. Second, we demonstrate the overclustering capabilities of deep CC for detecting topics in short text data sets in the absence of the ground truth class cardinality during model training. Third, we showcase that these capabilities can be leveraged for the DTD setting as a step towards dynamic learning over time and finally, we release our codebase to nurture further research in this area.

MCML Authors

Jann Goschenhofer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[428]

T. Ullmann.
Evaluation of clustering results and novel cluster algorithms: a metascientific perspective.
Dissertation 2022. DOI

Abstract

This dissertation addresses the reliability of clustering results and the evaluation of new clustering algorithms, particularly in light of the replication crisis in scientific research. The first contribution presents a framework for validating clustering results using validation data, ensuring the replicability and generalizability of findings. The second contribution quantifies over-optimistic bias in microbiome research by analyzing the effects of multiple analysis strategies on unsupervised tasks, while the third contribution highlights the over-optimism in evaluating new clustering algorithms, using the example of the ‘Rock’ algorithm, and advocates for more rigorous and neutral benchmarking methods. (Shortened.)

MCML Authors

Theresa Ullmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

* Former Member

[427]

C. M. Verdun.
Scalability in Ill-posed Machine Learning Problems: Bridging Least Squares Methods with (Non-)convex Algorithms.
Dissertation 2022. DOI

Abstract

We introduce novel algorithms to address some challenges in machine learning, including ill-conditioned low-rank matrix retrieval, constrained least squares, and high-dimensional regression with unknown noise. By bridging least squares with modern (non-)convex optimization, our methods achieve scalability, data efficiency, and robustness. We provide theoretical guarantees with minimal assumptions and numerically validate their performance.

MCML Authors

Claudio Mayrink Verdun

Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

* Former Member

[426]

R. Foygel Barber, M. Drton, N. Sturma and L. Weihs.
Half-trek criterion for identifiability of latent variable models.
Annals of Statistics 50.6 (Dec. 2022). DOI

Abstract

We consider linear structural equation models with latent variables and develop a criterion to certify whether the direct causal effects between the observable variables are identifiable based on the observed covariance matrix. Linear structural equation models assume that both observed and latent variables solve a linear equation system featuring stochastic noise terms. Each model corresponds to a directed graph whose edges represent the direct effects that appear as coefficients in the equation system. Prior research has developed a variety of methods to decide identifiability of direct effects in a latent projection framework, in which the confounding effects of the latent variables are represented by correlation among noise terms. This approach is effective when the confounding is sparse and effects only small subsets of the observed variables. In contrast, the new latent-factor half-trek criterion (LF-HTC) we develop in this paper operates on the original unprojected latent variable model and is able to certify identifiability in settings, where some latent variables may also have dense effects on many or even all of the observables. Our LF-HTC is an effective sufficient criterion for rational identifiability, under which the direct effects can be uniquely recovered as rational functions of the joint covariance matrix of the observed random variables. When restricting the search steps in LF-HTC to consider subsets of latent variables of bounded size, the criterion can be verified in time that is polynomial in the size of the graph.

MCML Authors

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

Nils Sturma

Mathematical Statistics

[425]

K. Lotto, T. Nagler and M. Radic.
Modeling Stochastic Data Using Copulas for Applications in the Validation of Autonomous Driving.
Electronics 11.24 (Dec. 2022). DOI

Abstract

The verification and validation processes of fully automated vehicles are linked to an almost intractable challenge of reflecting the real world with all its interactions in a virtual environment. Influential stochastic parameters need to be extracted from real-world measurements and real-time data, capturing all interdependencies, for an accurate simulation of reality. A copula is a probability model that represents a multivariate distribution, examining the dependence between the underlying variables. This model is used on drone measurement data from a roundabout containing dependent stochastic parameters. With the help of the copula model, samples are generated that reflect the real-time data. The resulting applications and possible extensions are discussed and explored.

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[424]

S. Legler, T. Janjic, M. H. Shaker and E. Hüllermeier.
Machine learning for estimating parameters of a convective-scale model: A comparison of neural networks and random forests.
GMA - 32nd Workshop of Computational Intelligence of the VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik. Berlin, Germany, Dec 01-02, 2022. PDF

Abstract

Errors and inaccuracies in the representation of clouds in convection-permitting numerical weather prediction models can be caused by various sources, including the forcing and boundary conditions, the representation of orography, and the accuracy of the numerical schemes determining the evolution of humidity and temperature. Moreover, the parametrization of microphysics and the parametrization of processes in the surface and boundary layers do have a significant influence. These schemes typically contain several tunable parameters that are either non-physical or only crudely known, leading to model errors and imprecision. Furthermore, not accounting for uncertainties in these parameters might lead to overconfidence in the model during forecasting and data assimilation (DA).
Traditionally, the numerical values of model parameters are chosen by manual model tuning. More objectively, they can be estimated from observations by the so-called augmented state approach during the data assimilation [7]. Alternatively, the problem of estimating model parameters has recently been tackled by means of a hybrid approach combining DA with machine learning, more specifically a Bayesian neural network (BNN) [6]. As a proof of concept, this approach has been applied to a one-dimensional modified shallow-water (MSW) model [8].
Even though the BNN is able to accurately estimate the model parameters and their uncertainties, its high computational cost poses an obstacle to its use in operational settings where the grid sizes of the atmospheric fields are much larger than in the simple MSW model. Because random forests (RF) [2] are typically computationally cheaper while still being able to adequately represent uncertainties, we are interested in comparing RFs and BNNs. To this end, we follow [6] and again consider the problem of estimating the three model parameters of the MSW model as a function of the atmospheric state.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[423]

M. Ali, M. Berrendorf, C. T. Hoyt, L. Vermue, M. Galkin, S. Sharifzadeh, A. Fischer, V. Tresp and J. Lehmann.
Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models under a Unified Framework.
IEEE Transactions on Pattern Analysis and Machine Intelligence 44.12 (Dec. 2022). DOI GitHub

Abstract

The heterogeneity in recently published knowledge graph embedding models’ implementations, training, and evaluation has made fair and thorough comparisons difficult. To assess the reproducibility of previously published results, we re-implemented and evaluated 21 models in the PyKEEN software package. In this paper, we outline which results could be reproduced with their reported hyper-parameters, which could only be reproduced with alternate hyper-parameters, and which could not be reproduced at all, as well as provide insight as to why this might be the case. We then performed a large-scale benchmarking on four datasets with several thousands of experiments and 24,804 GPU hours of computation time. We present insights gained as to best practices, best configurations for each model, and where improvements could be made over previously published best configurations. Our results highlight that the combination of model architecture, training approach, loss function, and the explicit modeling of inverse relations is crucial for a model’s performance and is not only determined by its architecture. We provide evidence that several architectures can obtain results competitive to the state of the art when configured carefully.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Database Systems and Data Mining

[422]

C. Fritz, G. De Nicola, F. Günther, D. Rügamer, M. Rave, M. Schneble, A. Bender, M. Weigert, R. Brinks, A. Hoyer, U. Berger, H. Küchenhoff and G. Kauermann.
Challenges in Interpreting Epidemiological Surveillance Data – Experiences from Germany.
Journal of Computational and Graphical Statistics 32.3 (Dec. 2022). DOI

Abstract

As early as March 2020, the authors of this letter started to work on surveillance data to obtain a clearer picture of the pandemic’s dynamic. This letter outlines the lessons learned during this peculiar time, emphasizing the benefits that better data collection, management, and communication processes would bring to the table. We further want to promote nuanced data analyses as a vital element of general political discussion as opposed to drawing conclusions from raw data, which are often flawed in epidemiological surveillance data, and therefore underline the overall need for statistics to play a more central role in public discourse.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[421]

D. Bethge, L. F. Coelho, T. Kosch, S. Murugaboopathy, U. von Zadow, A. Schmidt and T. Grosse-Puppendahl.
Technical Design Space Analysis for Unobtrusive Driver Emotion Assessment Using Multi-Domain Context.
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6.4 (Dec. 2022). DOI

Abstract

Driver emotions play a vital role in driving safety and performance. Consequently, regulating driver emotions through empathic interfaces have been investigated thoroughly. However, the prerequisite - driver emotion sensing - is a challenging endeavor: Body-worn physiological sensors are intrusive, while facial and speech recognition only capture overt emotions. In a user study (N=27), we investigate how emotions can be unobtrusively predicted by analyzing a rich set of contextual features captured by a smartphone, including road and traffic conditions, visual scene analysis, audio, weather information, and car speed. We derive a technical design space to inform practitioners and researchers about the most indicative sensing modalities, the corresponding impact on users’ privacy, and the computational cost associated with processing this data. Our analysis shows that contextual emotion recognition is significantly more robust than facial recognition, leading to an overall improvement of 7% using a leave-one-participant-out cross-validation.

MCML Authors

Albrecht Schmidt

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Human-Centered Ubiquitous Media

[420]

H. Boche, A. Fono and G. Kutyniok.
Non-Computability of the Pseudoinverse on Digital Computers.
Preprint (Dec. 2022). arXiv

Abstract

The pseudoinverse of a matrix, a generalized notion of the inverse, is of fundamental importance in linear algebra. However, there does not exist a closed form representation of the pseudoinverse, which can be straightforwardly computed. Therefore, an algorithmic computation is necessary. An algorithmic computation can only be evaluated by also considering the underlying hardware, typically digital hardware, which is responsible for performing the actual computations step by step. In this paper, we analyze if and to what degree the pseudoinverse actually can be computed on digital hardware platforms modeled as Turing machines. For this, we utilize the notion of an effective algorithm which describes a provably correct computation: upon an input of any error parameter, the algorithm provides an approximation within the given error bound with respect to the unknown solution. We prove that an effective algorithm for computing the pseudoinverse of any matrix can not exist on a Turing machine, although provably correct algorithms do exist for specific classes of matrices. Even more, our results introduce a lower bound on the accuracy that can be obtained algorithmically when computing the pseudoinverse on Turing machines.

MCML Authors

Adalbert Fono

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[419]

H. Huang, J. Qiu and K. Riedl.
Consensus-Based Optimization for Saddle Point Problems.
Preprint (Dec. 2022). arXiv

Abstract

In this paper, we propose consensus-based optimization for saddle point problems (CBO-SP), a novel multi-particle metaheuristic derivative-free optimization method capable of provably finding global Nash equilibria. Following the idea of swarm intelligence, the method employs a group of interacting particles, which perform a minimization over one variable and a maximization over the other. This paradigm permits a passage to the mean-field limit, which makes the method amenable to theoretical analysis and allows to obtain rigorous convergence guarantees under reasonable assumptions about the initialization and the objective function, which most notably include nonconvex-nonconcave objectives.

MCML Authors

Konstantin Riedl

Dr.

* Former Member

[418]

W. Durani, D. Mautz, C. Plant and C. Böhm.
DBHD: Density-based clustering for highly varying density.
ICDM 2022 - 22nd IEEE International Conference on Data Mining. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI

Abstract

A major challenge in cluster analysis is the discovery of clusters with widely varying sizes, densities, and shapes. Most clustering algorithms lack the ability to detect heterogeneous clusters that differ greatly in all three properties simultaneously. In this work, we propose the Density Clustering for Highly varying Density algorithm (DBHD). DBHD uses a novel approach that considers local density information and introduces two new conditions to distinguish between different types of data points. Based on this and the adaptively computed density information, DBHD can detect the clusters described above and is robust to noise. Moreover, DBHD has intuitive and robust parameters. In extensive experiments, we show that our technique is considerably more effective in detecting clusters of different shapes, sizes, and densities than well-known (DBSCAN or OPTICS) and recently proposed algorithms such as DPC, SNN-DPC, or LSDBC.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[417]

S. Gilhuber, P. Jahn, Y. Ma and T. Seidl.
VERIPS: Verified Pseudo-label Selection for Deep Active Learning.
ICDM 2022 - 22nd IEEE International Conference on Data Mining. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI GitHub

Abstract

Active learning has the power to significantly reduce the amount of labeled data needed to build strong classifiers. Existing active pseudo-labeling methods show high potential in integrating pseudo-labels within the active learning loop but heavily depend on the prediction accuracy of the model. In this work, we propose VERIPS, an algorithm that significantly outperforms existing pseudo-labeling techniques for active learning. At its core, VERIPS uses a pseudo-label verification mechanism that consists of a second network only trained on data approved by the oracle and helps to discard questionable pseudo-labels. In particular, the verifier model eliminates all pseudo-labels for which it disagrees with the actual task model. VERIPS overcomes the problems of poorly performing initial models, e.g., due to imbalanced or too small initial pools, where previous methods select too many incorrect pseudo-labels and recovering takes long or is not possible. Moreover, VERIPS is particularly insensitive to parameter choices that existing approaches suffer from.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Philipp Jahn

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[416]

M. Rezaei, E. Dorigatti, D. Rügamer and B. Bischl.
Joint Debiased Representation Learning and Imbalanced Data Clustering.
ICDMW 2022 - IEEE International Conference on Data Mining Workshops. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI

Abstract

One of the most promising approaches for unsu-pervised learning is combining deep representation learning and deep clustering. Some recent works propose to simultaneously learn representation using deep neural networks and perform clustering by defining a clustering loss on top of embedded features. However, these approaches are sensitive to imbalanced data and out-of-distribution samples. As a consequence, these methods optimize clustering by pushing data close to randomly initialized cluster centers. This is problematic when the number of instances varies largely in different classes or a cluster with few samples has less chance to be assigned a good centroid. To overcome these limitations, we introduce a new unsupervised framework for joint debiased representation learning and image clustering. We simultaneously train two deep learning models, a deep representation network that captures the data distribution, and a deep clustering network that learns embedded features and performs clustering. Specifically, the clustering network and learning representation network both take advantage of our proposed statistics pooling block that represents mean, variance, and cardinality to handle the out-of-distribution samples and class imbalance. Our experiments show that using these repre-sentations, one can considerably improve results on imbalanced image clustering across a variety of image datasets. Moreover, the learned representations generalize well when transferred to the out-of-distribution dataset.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Statistical Learning and Data Science

[415]

N. Strauß, M. Berrendorf, T. Haider and M. Schubert.
A Comparison of Ambulance Redeployment Systems on Real-World Data.
ICDMW 2022 - IEEE International Conference on Data Mining Workshops. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI GitHub

Abstract

Modern Emergency Medical Services (EMS) benefit from real-time sensor information in various ways as they provide up-to-date location information and help assess current local emergency risks. A critical part of EMS is dynamic ambulance redeployment, i.e., the task of assigning idle ambulances to base stations throughout a community. Although there has been a considerable effort on methods to optimize emergency response systems, a comparison of proposed methods is generally difficult as reported results are mostly based on artificial and proprietary test beds. In this paper, we present a benchmark simulation environment for dynamic ambulance redeployment based on real emergency data from the city of San Francisco. Our proposed simulation environment is highly scalable and is compatible with modern reinforcement learning frameworks. We provide a comparative study of several state-of-the-art methods for various metrics. Results indicate that even simple baseline algorithms can perform considerably well in close-to-realistic settings.

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Max Berrendorf

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[414]

W. Simson.
Physics-Informed Deep Learning for Advanced Medical Ultrasound.
Dissertation 2022. URL

Abstract

Freehand ultrasound imaging is an important medical imaging modality due to its ease of applicability and wide application spectrum. Still, modern ultrasound imaging is a largely passive imaging modality, and does not dynamically adapt to the physics in the medium of interest. This dissertation presents the application of physics-informed deep learning for ultrasound imaging applied to sound speed estimation.

MCML Authors

Walter Simson

Dr.

* Former Member

[413]

J. Ullerich, M. Windl, A. Bulling and S. Mayer.
ThumbPitch: Enriching Thumb Interaction on Mobile Touchscreens using Deep Learning.
OZCHI 2022 - 33rd Australian Conference on Human-Computer Interaction. Canberra, NSW, Australia, Nov 29-Dec 02, 2022. DOI

Abstract

Today touchscreens are one of the most common input devices for everyday ubiquitous interaction. Yet, capacitive touchscreens are limited in expressiveness; thus, a large body of work has focused on extending the input capabilities of touchscreens. One promising approach is to use index finger orientation; however, this requires a two-handed interaction and poses ergonomic constraints. We propose using the thumb’s pitch as an additional input dimension to counteract these limitations, enabling one-handed interaction scenarios. Our deep convolutional neural network detecting the thumb’s pitch is trained on more than 230,000 ground truth images recorded using a motion tracking system. We highlight the potential of ThumbPitch by proposing several use cases that exploit the higher expressiveness, especially for one-handed scenarios. We tested three use cases in a validation study and validated our model. Our model achieved a mean error of only 11.9°.

MCML Authors

Maximiliane Windl

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

* Former Member

[412]

N. Hurmer, X.-Y. To, M. Binder, H. A. Gündüz, P. C. Münch, R. Mreches, A. C. McHardy, B. Bischl and M. Rezaei.
Transformer Model for Genome Sequence Analysis.
LMRL @NeurIPS 2022 - Workshop on Learning Meaningful Representations of Life at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

One major challenge of applying machine learning in genomics is the scarcity of labeled data, which often requires expensive and time-consuming physical experimentation under laboratory conditions to obtain. However, the advent of high throughput sequencing has made large quantities of unlabeled genome data available. This can be used to apply semi-supervised learning methods through representation learning. In this paper, we investigate the impact of a popular and well-established language model, namely BERT [Devlin et al., 2018], for sequence genome analysis. Specifically, we adapt DNABERT [Ji et al., 2021] to GenomeNet-BERT in order to produce useful representations for downstream tasks such as classification and semi10 supervised learning. We explore different pretraining setups and compare their performance on a virus genome classification task to strictly supervised training and baselines on different training set size setups. The conducted experiments show that this architecture provides an increase in performance compared to existing methods at the cost of more resource-intensive training.

MCML Authors

Xiao-Yin To

Statistical Learning and Data Science

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Hüseyin Anil Gündüz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[411]

I. Ziegler, B. Ma, E. Nie, B. Bischl, D. Rügamer, B. Schubert and E. Dorigatti.
What cleaves? Is proteasomal cleavage prediction reaching a ceiling?
LMRL @NeurIPS 2022 - Workshop on Learning Meaningful Representations of Life at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Epitope vaccines are a promising direction to enable precision treatment for cancer, autoimmune diseases, and allergies. Effectively designing such vaccines requires accurate prediction of proteasomal cleavage in order to ensure that the epitopes in the vaccine are presented to T cells by the major histocompatibility complex (MHC). While direct identification of proteasomal cleavage in vitro is cumbersome and low throughput, it is possible to implicitly infer cleavage events from the termini of MHC-presented epitopes, which can be detected in large amounts thanks to recent advances in high-throughput MHC ligandomics. Inferring cleavage events in such a way provides an inherently noisy signal which can be tackled with new developments in the field of deep learning that supposedly make it possible to learn predictors from noisy labels. Inspired by such innovations, we sought to modernize proteasomal cleavage predictors by benchmarking a wide range of recent methods, including LSTMs, transformers, CNNs, and denoising methods, on a recently introduced cleavage dataset. We found that increasing model scale and complexity appeared to deliver limited performance gains, as several methods reached about 88.5% AUC on C-terminal and 79.5% AUC on N-terminal cleavage prediction. This suggests that the noise and/or complexity of proteasomal cleavage and the subsequent biological processes of the antigen processing pathway are the major limiting factors for predictive performance rather than the specific modeling approach used. While biological complexity can be tackled by more data and better models, noise and randomness inherently limit the maximum achievable predictive performance.

MCML Authors

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Emilio Dorigatti

Dr.

* Former Member

[410]

H. Aliee, T. Richter, M. Solonin, I. Ibarra, F. J. Theis and N. Kilbertus.
Sparsity in Continuous-Depth Neural Networks.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Neural Ordinary Differential Equations (NODEs) have proven successful in learning dynamical systems in terms of accurately recovering the observed trajectories. While different types of sparsity have been proposed to improve robustness, the generalization properties of NODEs for dynamical systems beyond the observed data are underexplored. We systematically study the influence of weight and feature sparsity on forecasting as well as on identifying the underlying dynamical laws. Besides assessing existing methods, we propose a regularization technique to sparsify input-output connections’’ and extract relevant features during training. Moreover, we curate real-world datasets including human motion capture and human hematopoiesis single-cell RNA-seq data to realistically analyze different levels of out-of-distribution (OOD) generalization in forecasting and dynamics identification respectively. Our extensive empirical evaluation on these challenging benchmarks suggests that weight sparsity improves generalization in the presence of noise or irregular sampling. However, it does not prevent learning spurious feature dependencies in the inferred dynamics, rendering them impractical for predictions under interventions, or for inferring the true underlying dynamics. Instead, feature sparsity can indeed help with recovering sparse ground-truth dynamics compared to unregularized NODEs.

MCML Authors

Till Richter

Mathematical Modelling of Biological Systems

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[409]

V. Bengs, E. Hüllermeier and W. Waegeman.
Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner’s (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[408]

A. Blattmann, R. Rombach, K. Oktay and B. Ommer.
Retrieval-Augmented Diffusion Models.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Much of this success is due to the scalability of these architectures and hence caused by a dramatic increase in model complexity and in the computational resources invested in training these models. Our work questions the underlying paradigm of compressing large training data into ever growing parametric representations. We rather present an orthogonal, semi-parametric approach. We complement comparably small diffusion or autoregressive models with a separate image database and a retrieval strategy. During training we retrieve a set of nearest neighbors from this external database for each training instance and condition the generative model on these informative samples. While the retrieval approach is providing the (local) content, the model is focusing on learning the composition of scenes based on this content. As demonstrated by our experiments, simply swapping the database for one with different contents transfers a trained model post-hoc to a novel domain. The evaluation shows competitive performance on tasks which the generative model has not been trained on, such as class-conditional synthesis, zero-shot stylization or text-to-image synthesis without requiring paired text-image data. With negligible memory and computational overhead for the external database and retrieval we can significantly reduce the parameter count of the generative model and still outperform the state-of-the-art.

MCML Authors

Björn Ommer

Prof. Dr.

Computer Vision & Learning

[407]

J. Brandt, V. Bengs, B. Haddenhorst and E. Hüllermeier.
Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. Unlike existing works, we study this problem in a non-stochastic setting with subset-dependent feedback, i.e., the semi-bandit feedback received could be generated by an oblivious adversary and also might depend on the chosen set of arms. In addition, we consider a general feedback scenario covering both the numerical-based as well as preference-based case and introduce a sound theoretical framework for this setting guaranteeing sensible notions of optimal arms, which a learner seeks to find. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies from aggressive to conservative. Theoretical questions about the sufficient and necessary budget of the algorithm to find the best arm are answered and complemented by deriving lower bounds for any learning algorithm for this problem scenario.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[406]

L. Hetzel, S. Boehm, N. Kilbertus, S. Günnemann, M. Lotfollahi and F. J. Theis.
Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA HTS is required to enrich single-cell data meaningfully.We introduce chemCPA, a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with an architecture surgery for transfer learning and demonstrate how training on existing bulk RNA HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating drug discovery.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[405]

H. H.-H. Hsu, Y. Shen, C. Tomani and D. Cremers.
What Makes Graph Neural Networks Miscalibrated?
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Given the importance of getting calibrated predictions and reliable uncertainty estimations, various post-hoc calibration methods have been developed for neural networks on standard multi-class classification tasks. However, these methods are not well suited for calibrating graph neural networks (GNNs), which presents unique challenges such as accounting for the graph structure and the graph-induced correlations between the nodes. In this work, we conduct a systematic study on the calibration qualities of GNN node predictions. In particular, we identify five factors which influence the calibration of GNNs: general under-confident tendency, diversity of nodewise predictive distributions, distance to training nodes, relative confidence level, and neighborhood similarity. Furthermore, based on the insights from this study, we design a novel calibration method named Graph Attention Temperature Scaling (GATS), which is tailored for calibrating graph neural networks. GATS incorporates designs that address all the identified influential factors and produces nodewise temperature scaling using an attention-based architecture. GATS is accuracy-preserving, data-efficient, and expressive at the same time. Our experiments empirically verify the effectiveness of GATS, demonstrating that it can consistently achieve state-of-the-art calibration results on various graph datasets for different GNN backbones.

MCML Authors

Yuesong Shen

* Former Member

Christian Tomani

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[404]

C. Koke and G. Kutyniok.
Graph Scattering beyond Wavelet Shackles.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

This work develops a flexible and mathematically sound framework for the design and analysis of graph scattering networks with variable branching ratios and generic functional calculus filters.Spectrally-agnostic stability guarantees for node- and graph-level perturbations are derived; the vertex-set non-preserving case is treated by utilizing recently developed mathematical-physics based tools. Energy propagation through the network layers is investigated and related to truncation stability. New methods of graph-level feature aggregation are introduced and stability of the resulting composite scattering architectures is established. Finally, scattering transforms are extended to edge- and higher order tensorial input. Theoretical results are complemented by numerical investigations: Suitably chosen scattering networks conforming to the developed theory perform better than traditional graph-wavelet based scattering approaches in social network graph classification tasks andsignificantly outperform other graph-based learning approaches to regression of quantum-chemical energies on QM7.

MCML Authors

Christian Koke

Computer Vision & Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[403]

S. Maskey, R. Levie, Y. Lee and G. Kutyniok.
Generalization Analysis of Message Passing Neural Networks on Large Random Graphs.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Message passing neural networks (MPNN) have seen a steep rise in popularity since their introduction as generalizations of convolutional neural networks to graph-structured data, and are now considered state-of-the-art tools for solving a large variety of graph-focused problems. We study the generalization error of MPNNs in graph classification and regression. We assume that graphs of different classes are sampled from different random graph models. We show that, when training a MPNN on a dataset sampled from such a distribution, the generalization gap increases in the complexity of the MPNN, and decreases, not only with respect to the number of training samples, but also with the average number of nodes in the graphs. This shows how a MPNN with high complexity can generalize from a small dataset of graphs, as long as the graphs are large. The generalization bound is derived from a uniform convergence result, that shows that any MPNN, applied on a graph, approximates the MPNN applied on the geometric model that the graph discretizes.

MCML Authors

Sohir Maskey

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[402]

Y. Scholten, J. Schuchardt, S. Geisler, A. Bojchevski and S. Günnemann.
Randomized Message-Interception Smoothing: Gray-box Certificates for Graph Neural Networks.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Randomized smoothing is one of the most promising frameworks for certifying the adversarial robustness of machine learning models, including Graph Neural Networks (GNNs). Yet, existing randomized smoothing certificates for GNNs are overly pessimistic since they treat the model as a black box, ignoring the underlying architecture. To remedy this, we propose novel gray-box certificates that exploit the message-passing principle of GNNs: We randomly intercept messages and carefully analyze the probability that messages from adversarially controlled nodes reach their target nodes. Compared to existing certificates, we certify robustness to much stronger adversaries that control entire nodes in the graph and can arbitrarily manipulate node features. Our certificates provide stronger guarantees for attacks at larger distances, as messages from farther-away nodes are more likely to get intercepted. We demonstrate the effectiveness of our method on various models and datasets. Since our gray-box certificates consider the underlying graph structure, we can significantly improve certifiable robustness by applying graph sparsification.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[401]

Y. Shen and D. Cremers.
Deep Combinatorial Aggregation.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Neural networks are known to produce poor uncertainty estimations, and a variety of approaches have been proposed to remedy this issue. This includes deep ensemble, a simple and effective method that achieves state-of-the-art results for uncertainty-aware learning tasks. In this work, we explore a combinatorial generalization of deep ensemble called deep combinatorial aggregation (DCA). DCA creates multiple instances of network components and aggregates their combinations to produce diversified model proposals and predictions. DCA components can be defined at different levels of granularity. And we discovered that coarse-grain DCAs can outperform deep ensemble for uncertainty-aware learning both in terms of predictive performance and uncertainty estimation. For fine-grain DCAs, we discover that an average parameterization approach named deep combinatorial weight averaging (DCWA) can improve the baseline training. It is on par with stochastic weight averaging (SWA) but does not require any custom training schedule or adaptation of BatchNorm layers. Furthermore, we propose a consistency enforcing loss that helps the training of DCWA and modelwise DCA. We experiment on in-domain, distributional shift, and out-of-distribution image classification tasks, and empirically confirm the effectiveness of DCWA and DCA approaches.

MCML Authors

Yuesong Shen

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[400]

Y. Zhou, G. Kutyniok and B. Ribeiro.
OOD Link Prediction Generalization Capabilities of Message-Passing GNNs in Larger Test Graphs.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

This work provides the first theoretical study on the ability of graph Message Passing Neural Networks (gMPNNs) —such as Graph Neural Networks (GNNs)— to perform inductive out-of-distribution (OOD) link prediction tasks, where deployment (test) graph sizes are larger than training graphs. We first prove non-asymptotic bounds showing that link predictors based on permutation-equivariant (structural) node embeddings obtained by gMPNNs can converge to a random guess as test graphs get larger. We then propose a theoretically-sound gMPNN that outputs structural pairwise (2-node) embeddings and prove non-asymptotic bounds showing that, as test graphs grow, these embeddings converge to embeddings of a continuous function that retains its ability to predict links OOD. Empirical results on random graphs show agreement with our theoretical results.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[399]

H. H.-H. Hsu, Y. Shen and D. Cremers.
A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs.
NeurIPS 2022 - Workshop on New Frontiers in Graph Learning at the 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration error (ECE) and the agree/disagree ECEs, which provide criteria for uncertainty estimation on graphs beyond the nodewise setting. Our experiments demonstrate that the proposed edgewise metrics can complement the nodewise results and yield additional insights. Moreover, we show that GNN models which consider the structured prediction problem on graphs tend to have better uncertainty estimations, which illustrates the benefit of going beyond the nodewise setting.

MCML Authors

Yuesong Shen

* Former Member

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Vision & Artificial Intelligence

[398]

J. Li, M. Zhao, Y. Xie, A. Maronikolakis, P. Pu and H. Schütze.
This joke is [MASK]: Recognizing Humor and Offense with Prompting.
TL4NLP @NeurIPS 2022 - 1st Transfer Learning for Natural Language Processing Workshop at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Humor is a magnetic component in everyday human interactions and communications. Computationally modeling humor enables NLP systems to entertain and engage with users. We investigate the effectiveness of prompting, a new transfer learning paradigm for NLP, for humor recognition. We show that prompting performs similarly to finetuning when numerous annotations are available, but gives stellar performance in low-resource humor recognition. The relationship between humor and offense is also inspected by applying influence functions to prompting; we show that models could rely on offense to determine humor during transfer.

MCML Authors

Antonis Maronikolakis

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[397]

A. Farshad, Y. Yeganeh, H. Dhamo, F. Tombari and N. Navab.
DisPositioNet: Disentangled Pose and Identity in Semantic Image Manipulation.
BMVC 2022 - 33rd British Machine Vision Conference. London, UK, Nov 21-24, 2022. URL GitHub

Abstract

Graph representation of objects and their relations in a scene, known as a scene graph, provides a precise and discernible interface to manipulate a scene by modifying the nodes or the edges in the graph. Although existing works have shown promising results in modifying the placement and pose of objects, scene manipulation often leads to losing some visual characteristics like the appearance or identity of objects. In this work, we propose DisPositioNet, a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs in a self-supervised manner. Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph. In addition to producing more realistic images due to the decomposition of features like pose and identity, our method takes advantage of the probabilistic sampling in the intermediate features to generate more diverse images in object replacement or addition tasks. The results of our experiments show that disentangling the feature representations in the latent manifold of the model outperforms the previous works qualitatively and quantitatively on two public benchmarks.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[396]

M. Windl, A. Hiesinger, R. Welsch, A. Schmidt and S. S. Feger.
SaferHome: Interactive Physical and Digital Smart Home Dashboards for Communicating Privacy Assessments to Owners and Bystanders.
ISS 2022 - ACM Interactive Surfaces and Spaces Conference. Wellington, New Zealand, Nov 20-23, 2022. DOI

Abstract

Private homes are increasingly becoming smart spaces. While smart homes promise comfort, they expose most intimate spaces to security and privacy risks. Unfortunately, most users today are not equipped with the right tools to assess the vulnerabilities or privacy practices of smart devices. Further, users might lose track of the devices installed in their homes or are unaware of devices placed by a partner or host. We developed SaferHome, an interactive digital-physical privacy framework, to provide smart home users with security and privacy assessments and a sense of device location. SaferHome includes a digital list view and physical and digital dashboards that map real floor plans. We evaluated SaferHome with eight households in the wild. We find that users adopted various strategies to integrate the dashboards into their understanding and interpretation of smart home privacy. We present implications for the design of future smart home privacy frameworks that are impacted by technical affinity, device types, device ownership, and tangibility of assessments.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[395]

A. Campagner, J. Lienen, E. Hüllermeier and D. Ciucci.
Scikit-Weak: A Python Library for Weakly Supervised Machine Learning.
IJCRS 2022 - International Joint Conference on Rough Sets. Suzhou, China, Nov 11-14, 2022. DOI

Abstract

In this article we introduce and describe SCIKIT-WEAK, a Python library inspired by SCIKIT-LEARN and developed to provide an easy-to-use framework for dealing with weakly supervised and imprecise data learning problems, which, despite their importance in real-world settings, cannot be easily managed by existing libraries. We provide a rationale for the development of such a library, then we discuss its design and the currently implemented methods and classes, which encompass several state-of-the-art algorithms.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[394]

A. Frikha.
Deep knowledge transfer for generalization across tasks and domains under data scarcity: on intersections of anomaly detection, few-shot learning, continual learning, domain generalization and data-free learning.
Dissertation 2022. DOI

Abstract

This thesis addresses challenges in deep learning when key assumptions, such as abundant data or i.i.d. conditions, are violated. It introduces methods for anomaly detection with scarce data, enabling models to learn sequential tasks with minimal forgetting. For domain generalization, it proposes a feature-discovery algorithm that enhances generalization to unseen domains and a data-free approach to create robust models by synthesizing cross-domain knowledge from pre-trained models. These contributions advance deep learning for complex real-world scenarios.(Shortened).

MCML Authors

Ahmed Frikha

Dr.

* Former Member

[393]

J. Baan, W. Aziz, B. Plank and R. Fernandez.
Stop Measuring Calibration When Humans Disagree.
EMNLP 2022 - Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - including class frequency, ranking and entropy.

MCML Authors

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[392]

E. Bassignana, M. Müller-Eberstein, M. Zhang and B. Plank.
Evidence > Intuition: Transferability Estimation for Encoder Selection.
EMNLP 2022 - Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

With the increase in availability of large pre-trained language models (LMs) in Natural Language Processing (NLP), it becomes critical to assess their fit for a specific target task a priori—as fine-tuning the entire space of available LMs is computationally prohibitive and unsustainable. However, encoder transferability estimation has received little to no attention in NLP. In this paper, we propose to generate quantitative evidence to predict which LM, out of a pool of models, will perform best on a target task without having to fine-tune all candidates. We provide a comprehensive study on LM ranking for 10 NLP tasks spanning the two fundamental problem types of classification and structured prediction. We adopt the state-of-the-art Logarithm of Maximum Evidence (LogME) measure from Computer Vision (CV) and find that it positively correlates with final LM performance in 94% of the setups.In the first study of its kind, we further compare transferability measures with the de facto standard of human practitioner ranking, finding that evidence from quantitative metrics is more robust than pure intuition and can help identify unexpected LM candidates.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

AI and Computational Linguistics

[391]

V. Hangya, H. S. Saadi and A. Fraser.
Improving Low-Resource Languages in Pre-Trained Multilingual Language Models.
EMNLP 2022 - Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

Pre-trained multilingual language models are the foundation of many NLP approaches, including cross-lingual transfer solutions. However, languages with small available monolingual corpora are often not well-supported by these models leading to poor performance. We propose an unsupervised approach to improve the cross-lingual representations of low-resource languages by bootstrapping word translation pairs from monolingual corpora and using them to improve language alignment in pre-trained language models. We perform experiments on nine languages, using contextual word retrieval and zero-shot named entity recognition to measure both intrinsic cross-lingual word representation quality and downstream task performance, showing improvements on both tasks. Our results show that it is possible to improve pre-trained multilingual language models by relying only on non-parallel resources.

MCML Authors

Viktor Hangya

Dr.

* Former Member

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Data Analytics & Statistics

[390]

A. Imani, S. Severini, M. J. Sabet, F. Yvon and H. Schütze.
Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging.
EMNLP 2022 - Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

Part-of-Speech (POS) tagging is an important component of the NLP pipeline, but many low-resource languages lack labeled data for training. An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages. In this paper, we propose a novel method for transferring labels from multiple high-resource source to low-resource target languages. We formalize POS tag projection as graph-based label propagation. Given translations of a sentence in multiple languages, we create a graph with words as nodes and alignment links as edges by aligning words for all language pairs. We then propagate node labels from source to target using a Graph Neural Network augmented with transformer layers. We show that our propagation creates training sets that allow us to train POS taggers for a diverse set of languages. When combined with enhanced contextualized embeddings, our method achieves a new state-of-the-art for unsupervised POS tagging of low-resource languages.

MCML Authors

Ayyoob Imani

Computational Linguistics

Masoud Jalili Sabet

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[389]

M. Müller-Eberstein, R. van der Goot and B. Plank.
Spectral Probing.
EMNLP 2022 - Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

Linguistic information is encoded at varying timescales (subwords, phrases, etc.) and communicative levels, such as syntax and semantics. Contextualized embeddings have analogously been found to capture these phenomena at distinctive layers and frequencies. Leveraging these findings, we develop a fully learnable frequency filter to identify spectral profiles for any given task. It enables vastly more granular analyses than prior handcrafted filters, and improves on efficiency. After demonstrating the informativeness of spectral probing over manual filters in a monolingual setting, we investigate its multilingual characteristics across seven diverse NLP tasks in six languages. Our analyses identify distinctive spectral profiles which quantify cross-task similarity in a linguistically intuitive manner, while remaining consistent across languages—highlighting their potential as robust, lightweight task descriptors.

MCML Authors

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[388]

B. Plank.
The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation.
EMNLP 2022 - Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

Human variation in labeling is often considered noise. Annotation projects for machine learning (ML) aim at minimizing human label variation, with the assumption to maximize data quality and in turn optimize and maximize machine learning metrics. However, thisconventional practice assumes that there exists a ground truth, and neglects that there exists genuine human variation in labeling due to disagreement, subjectivity in annotation or multiple plausible answers. In this position paper, we argue that this big open problem of human label variation persists and critically needs more attention to move our field forward. This is because human label variation impacts all stages of the ML pipeline: data, modeling and evaluation. However, few works consider all of these dimensions jointly; and existing research is fragmented. We reconcile different previously proposed notions of human label variation, provide a repository of publicly-available datasets with un-aggregated labels, depict approaches proposed so far, identify gaps and suggest ways forward. As datasets are becoming increasingly available, we hope that this synthesized view on the ‘problem’ will lead to an open discussion on possible strategies to devise fundamentally new directions.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

AI and Computational Linguistics

[387]

L. Weissweiler, V. Hofmann, A. Köksal and H. Schütze.
The better your Syntax, the better your Semantics? Probing Pretrained Language Models for the English Comparative Correlative.
EMNLP 2022 - Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step towards assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models’ behaviour in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs are able to recognise the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.

MCML Authors

Leonie Weissweiler

Dr.

* Former Member

Valentin Hofmann

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Abdullatif Köksal

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[386]

E. Bassignana and B. Plank.
CrossRE: A Cross-Domain Dataset for Relation Extraction.
EMNLP 2022 - Findings of the Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

Relation Extraction (RE) has attracted increasing attention, but current RE evaluation is limited to in-domain evaluation setups. Little is known on how well a RE system fares in challenging, but realistic out-of-distribution evaluation setups. To address this gap, we propose CrossRE, a new, freely-available cross-domain benchmark for RE, which comprises six distinct text domains and includes multi-label annotations. An additional innovation is that we release meta-data collected during annotation, to include explanations and flags of difficult instances. We provide an empirical evaluation with a state-of-the-art model for relation classification. As the meta-data enables us to shed new light on the state-of-the-art model, we provide a comprehensive analysis on the impact of difficult cases and find correlations between model and human annotations. Overall, our empirical investigation highlights the difficulty of cross-domain RE. We release our dataset, to spur more research in this direction.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

AI and Computational Linguistics

[385]

W. Lai, A. Chronopoulou and A. Fraser.
m4 Adapter: Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter.
EMNLP 2022 - Findings of the Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

Multilingual neural machine translation models (MNMT) yield state-of-the-art performance when evaluated on data from a domain and language pair seen at training time. However, when a MNMT model is used to translate under domain shift or to a new language pair, performance drops dramatically. We consider a very challenging scenario: adapting the MNMT model both to a new domain and to a new language pair at the same time. In this paper, we propose m4Adapter (Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter), which combines domain and language knowledge using meta-learning with adapters. We present results showing that our approach is a parameter-efficient solution which effectively adapts a model to both a new language pair and a new domain, while outperforming other adapter methods. An ablation study also shows that our approach more effectively transfers domain knowledge across different languages and language information across different domains.

MCML Authors

Wen Lai

Data Analytics & Statistics

Alexandra Chronopoulou

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

* Former Member

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

[384]

D. Ulmer, E. Bassignana, M. Müller-Eberstein, D. Varab, M. Zhang, R. van der Goot, C. Hardmeier and B. Plank.
Experimental Standards for Deep Learning in Natural Language Processing Research.
EMNLP 2022 - Findings of the Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, compared to more established disciplines, a lack of common experimental standards remains an open challenge to the field at large. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards in NLP into a single, widely-applicable methodology. Following these best practices is crucial to strengthen experimental evidence, improve reproducibility and enable scientific progress. These standards are further collected in a public repository to help them transparently adapt to future needs.

MCML Authors

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

AI and Computational Linguistics

[383]

H. S. Saadi, V. Hangya, T. Eder and A. Fraser.
Comparative Analysis of Cross-lingual Contextualized Word Embeddings.
MRL @EMNLP 2022 - 2nd Workshop on Multi-lingual Representation Learning at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI

Abstract

Contextualized word embeddings have emerged as the most important tool for performing NLP tasks in a large variety of languages. In order to improve the cross-lingual representation and transfer learning quality, contextualized embedding alignment techniques, such as mapping and model fine-tuning, are employed. Existing techniques however are time-, data- and computational resource-intensive. In this paper we analyze these techniques by utilizing three tasks: bilingual lexicon induction (BLI), word retrieval and cross-lingual natural language inference (XNLI) for a high resource (German-English) and a low resource (Bengali-English) language pair. In contrast to previous works which focus only on a few popular models, we compare five multilingual and seven monolingual language models and investigate the effect of various aspects on their performance, such as vocabulary size, number of languages used for training and number of parameters. Additionally, we propose a parameter-, data- and runtime-efficient technique which can be trained with 10% of the data, less than 10% of the time and have less than 5% of the trainable parameters compared to model fine-tuning. We show that our proposed method is competitive with resource heavy models, even outperforming them in some cases, even though it relies on less resource.

MCML Authors

Viktor Hangya

Dr.

* Former Member

Alexander Fraser

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Data Analytics & Statistics

[382]

M. Bernhard and M. Schubert.
Robust Object Detection in Remote Sensing Imagery with Noisy and Sparse Geo-Annotations.
ACM SIGSPATIAL 2022 - 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. Seattle, WA, USA, Nov 01-04, 2022. DOI GitHub

Abstract

Recently, the availability of remote sensing imagery from aerial vehicles and satellites constantly improved. For an automated interpretation of such data, deep-learning-based object detectors achieve state-of-the-art performance. However, established object detectors require complete, precise, and correct bounding box annotations for training. In order to create the necessary training annotations for object detectors, imagery can be georeferenced and combined with data from other sources, such as points of interest localized by GPS sensors. Unfortunately, this combination often leads to poor object localization and missing annotations. Therefore, training object detectors with such data often results in insufficient detection performance. In this paper, we present a novel approach for training object detectors with extremely noisy and incomplete annotations. Our method is based on a teacher-student learning framework and a correction module accounting for imprecise and missing annotations. Thus, our method is easy to use and can be combined with arbitrary object detectors. We demonstrate that our approach improves standard detectors by 37.1% $AP_{50}$ on a noisy real-world remote-sensing dataset. Furthermore, our method achieves great performance gains on two datasets with synthetic noise.

MCML Authors

Maximilian Bernhard

* Former Member

Matthias Schubert

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Spatial Artificial Intelligence

[381]

E. Pretzsch, V. Heinemann, S. Stintzing, A. Bender, S. Chen, J. W. Holch, F. O. Hofmann, H. Ren, F. Böschand, H. Küchenhoff, J. Werner and M. K. Angele.
EMT-Related Genes Have No Prognostic Relevance in Metastatic Colorectal Cancer as Opposed to Stage II/III: Analysis of the Randomised, Phase III Trial FIRE-3 (AIO KRK 0306; FIRE-3).
Cancers 14.22 (Nov. 2022). DOI

Abstract

Despite huge advances in local and systemic therapies, the 5-year relative survival rate for patients with metastatic CRC is still low. To avoid over- or undertreatment, proper risk stratification with regard to treatment strategy is highly needed. As EMT (epithelial-mesenchymal transition) is a major step in metastatic spread, this study analysed the prognostic effect of EMT-related genes in stage IV colorectal cancer patients using the study cohort of the FIRE-3 trial, an open-label multi-centre randomised controlled phase III trial of stage IV colorectal cancer patients. Overall, the prognostic relevance of EMT-related genes seems stage-dependent. EMT-related genes have no prognostic relevance in stage IV CRC as opposed to stage II/III.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Shuo Chen

Database Systems and Data Mining

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[380]

A. Lohrer, J. J. Binder and P. Kröger.
Group Anomaly Detection for Spatio-Temporal Collective Behaviour Scenarios in Smart Cities.
IWCTS @ACM SIGSPATIAL 2022 - 15th International Workshop on Computational Transportation Science at the 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2022). Seattle, WA, USA, Nov 01-04, 2022. DOI

Abstract

Group anomaly detection in terms of detecting and predicting abnormal behaviour from entities as a group rather than as an individual, addresses a variety of challenges in spatio-temporal environments like e.g. traffic and transportation systems, smart cities, geoinformation systems, etc. They provide information about a commonly large number of individual entities. Examples for such entities would be airplanes and drones, vehicles, ships but also people, remote sensors and any other information source in interaction with the environment. However, as point anomaly detection is quite common for revealing the abnormal behaviour of individual entities, the collective behaviour of the individuals as a group remains completely uncovered. For example potential for traffic flow optimizations or increased local traffic guideline violations cannot be detected by one single drive but by considering the behavior of a group of vehicle drives in this area. With this work-in-progress we elaborate the potential of group anomaly detection algorithms for spatio-temporal collective behaviour scenarios in smart cities. We describe the group anomaly detection problem in the context of urban planning and demonstrate its effectiveness on a public real-world data set for urban rental bike rides and stations in and around Munich revealing abnormal groups of rides, which allows to optimize the rental bike accessibility to the population and with that to contribute to a sustainable environment.

MCML Authors

Andreas Lohrer

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[379]

D. S. Fischer, M. Ali, S. Richter, A. Ertürk and F. J. Theis.
Graph neural networks learn emergent tissue properties from spatial molecular profiles.
Preprint (Nov. 2022). DOI

Abstract

Tissue phenotypes such as metabolic states, inflammation, and tumor properties are functions of molecular states of cells that constitute the tissue. Recent spatial molecular profiling assays measure tissue architecture motifs in a molecular and often unbiased way and thus can explain some aspects of emergence of these phenotypes. Here, we characterize the ability of graph neural networks to model tissue-level emergent phenotypes based on spatial data by evaluating phenotype prediction across model complexities. First, we show that immune cell dispersion in colorectal tumors, which is known to be predictive of disease outcome, can be captured by graph neural networks. Second, we show that breast cancer tumor classes can be predicted from gene expression alone without spatial information and are thus too simplistic a phenotype to require a complex model of emergence. Third, we show that representation learning approaches for spatial graphs of molecular profiles are limited by overfitting in the prevalent regime of up to 100s of images per study. We address overfitting with within-graph self-supervision and illustrate its promise for tissue representation learning as a constraint for node representations.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Mathematical Modelling of Biological Systems

[378]

M. Herrmann.
Towards more reliable machine learning: conceptual insights and practical approaches for unsupervised manifold learning and supervised benchmark studies.
Dissertation 2022. DOI

Abstract

This thesis focuses on improving the reliability and trustworthiness of machine learning, particularly in unsupervised learning methods like manifold learning. It investigates the challenges of evaluating manifold learning techniques and proposes improvements for embedding evaluation, outlier detection, and cluster analysis, using methods like UMAP and DBSCAN. Additionally, the thesis contributes to supervised learning by presenting a benchmark study on survival prediction in multi-omics cancer data and exploring the effects of design and analysis choices on benchmark results. (Shortened).

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

[377]

S. Shit, R. Koner, B. Wittmann, J. C. Paetzold, I. Ezhov, H. Li, J. Pan, S. Sharifzadeh, G. Kaissis, V. Tresp and B. Menze.
Relationformer: A Unified Framework for Image-to-Graph Generation.
ECCV 2022 - 17th European Conference on Computer Vision. Tel Aviv, Israel, Oct 23-27, 2022. DOI GitHub

Abstract

A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach’s effectiveness and generalizability.

MCML Authors

Rajat Koner

Database Systems and Data Mining

Georgios Kaissis

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[376]

C. Tomani, D. Cremers and F. Buettner.
Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration.
ECCV 2022 - 17th European Conference on Computer Vision. Tel Aviv, Israel, Oct 23-27, 2022. DOI GitHub

Abstract

We address the problem of uncertainty calibration and introduce a novel calibration method, Parametrized Temperature Scaling (PTS). Standard deep neural networks typically yield uncalibrated predictions, which can be transformed into calibrated confidence scores using post-hoc calibration methods. In this contribution, we demonstrate that the performance of accuracy-preserving state-of-the-art post-hoc calibrators is limited by their intrinsic expressive power. We generalize temperature scaling by computing prediction-specific temperatures, parameterized by a neural network. We show with extensive experiments that our novel accuracy-preserving approach consistently outperforms existing algorithms across a large number of model architectures, datasets and metrics.

MCML Authors

Christian Tomani

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[375]

F. Pfisterer.
Democratizing machine learning: contributions in AutoML and fairness.
Dissertation 2022. DOI

Abstract

This thesis focuses on democratizing access to machine learning (ML) by improving automated machine learning (AutoML) systems and making ML tools more accessible to non-experts. Key contributions include methods to accelerate hyperparameter optimization by learning from previous experiments, the integration of fairness considerations in AutoML, and the development of software packages such as mlr3pipelines for creating machine learning pipelines and mlr3fairness for auditing and debiasing models. The thesis also includes tools for estimating and mitigating model fairness, such as the mcboost package for multi-calibration, addressing both the technical and ethical challenges of widespread ML deployment. (Shortened.)

MCML Authors

Florian Pfisterer

Dr.

* Former Member

[374]

C. Zelenka, A. Lohrer, M. Bayer and P. Kröger.
AI4EO Hyperview: A SpectralNet3D and RNNPlus Approach for Sustainable Soil Parameter Estimation on Hyperspectral Image Data.
ICIP 2022 - IEEE International Conference on Image Processing. Bordeaux, France, Oct 16-19, 2022. DOI

Abstract

The goal of the Hyperview challenge is to use Hyperspectral Imaging (HSI) to predict the soil parameters potassium (K), phosphorus pentoxide (P 2 O 5 ), magnesium (Mg) and the pH value. These are relevant parameters to determine the need of fertilization in agriculture. With this knowledge, fertilizers can be applied in a targeted way rather than in a prophylactic way which is the current procedure of choice.In this context we introduce two different approaches to solve this regression task based on 3D CNNs with Huber loss regression (SpectralNet3D) and on 1D RNNs. Both methods show distinct advantages with a peak challenge metric score of 0.808 on provided validation data.

MCML Authors

Andreas Lohrer

* Former Member

Peer Kröger

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Principal Investigator

[373]

F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Domain Adaptation for Time-Series Classification to Mitigate Covariate Shift.
MM 2022 - 30th ACM International Conference on Multimedia. Lisbon, Portugal, Oct 10-14, 2022. DOI

Abstract

The performance of a machine learning model degrades when it is applied to data from a similar but different domain than the data it has initially been trained on. To mitigate this domain shift problem, domain adaptation (DA) techniques search for an optimal transformation that converts the (current) input data from a source domain to a target domain to learn a domain-invariant representation that reduces domain discrepancy. This paper proposes a novel supervised DA based on two steps. First, we search for an optimal class-dependent transformation from the source to the target domain from a few samples. We consider optimal transport methods such as the earth mover’s distance, Sinkhorn transport and correlation alignment. Second, we use embedding similarity techniques to select the corresponding transformation at inference. We use correlation metrics and higher-order moment matching techniques. We conduct an extensive evaluation on time-series datasets with domain shift including simulated and various online handwriting datasets to demonstrate the performance.

MCML Authors

Felix Ott

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Statistical Learning and Data Science

[372]

N. Palm, F. Stroebl and H. Palm.
Parameter Individual Optimal Experimental Design and Calibration of Parametric Models.
IEEE Access 10 (Oct. 2022). DOI GitHub

Abstract

Parametric models allow to reflect system behavior in general and characterize individual system instances by specific parameter values. For a variety of scientific disciplines, model calibration by parameter quantification is therefore of central importance. As the time and cost of calibration experiments increases, the question of how to determine parameter values of required quality with a minimum number of experiments comes to the fore. In this paper, a methodology is introduced allowing to quantify and optimize achievable parameter extraction quality based on an experimental plan including a process and methods how to adapt the experimental plan for improved estimation of individually selectable parameters. The resulting parameter-individual optimal design of experiments (pi-OED) enables experimenters to extract a maximum of parameter-specific information from a given number of experiments. We demonstrate how to minimize variance or covariances of individually selectable parameter estimators by model-based calculation of the experimental designs. Using the Fisher Information Matrix in combination with the Cramer-Raó inequality, the pi-OED plan is reduced to a global optimization problem. The pi-OED workflow is demonstrated using computer experiments to calibrate a model describing calendrical aging of lithium-ion battery cells. Applying bootstrapping methods allows to also quantify parameter estimation distributions for further benchmarking. Comparing pi-OED based computer experimental results with those based on state-of-the-art designs of experiments, reveals its efficiency improvement. All computer experimental results are gained in Python and may be reproduced using a provided Jupyter Notebook along with the source code. Both are available under https://github.com/nicolaipalm/oed.

MCML Authors

Nicolai Palm

Computational Statistics & Data Science

[371]

J. Moosbauer, M. Binder, L. Schneider, F. Pfisterer, M. Becker, M. Lang, L. Kotthoff and B. Bischl.
Automated Benchmark-Driven Design and Explanation of Hyperparameter Optimizers.
IEEE Transactions on Evolutionary Computation 26.6 (Oct. 2022). DOI

Abstract

Automated hyperparameter optimization (HPO) has gained great popularity and is an important component of most automated machine learning frameworks. However, the process of designing HPO algorithms is still an unsystematic and manual process: new algorithms are often built on top of prior work, where limitations are identified and improvements are proposed. Even though this approach is guided by expert knowledge, it is still somewhat arbitrary. The process rarely allows for gaining a holistic understanding of which algorithmic components drive performance and carries the risk of overlooking good algorithmic design choices. We present a principled approach to automated benchmark-driven algorithm design applied to multifidelity HPO (MF-HPO). First, we formalize a rich space of MF-HPO candidates that includes, but is not limited to, common existing HPO algorithms and then present a configurable framework covering this space. To find the best candidate automatically and systematically, we follow a programming-by-optimization approach and search over the space of algorithm candidates via Bayesian optimization. We challenge whether the found design choices are necessary or could be replaced by more naive and simpler ones by performing an ablation analysis. We observe that using a relatively simple configuration (in some ways, simpler than established methods) performs very well as long as some critical configuration parameters are set to the right value.

MCML Authors

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[370]

E. Schede, J. Brandt, A. Tornede, M. Wever, V. Bengs, E. Hüllermeier and K. Tierney.
A Survey of Methods for Automated Algorithm Configuration.
Journal of Artificial Intelligence Research 75 (Oct. 2022). DOI

Abstract

Algorithm configuration (AC) is concerned with the automated search of the most suitable parameter configuration of a parametrized algorithm. There is currently a wide variety of AC problem variants and methods proposed in the literature. Existing reviews do not take into account all derivatives of the AC problem, nor do they offer a complete classification scheme. To this end, we introduce taxonomies to describe the AC problem and features of configuration methods, respectively. We review existing AC literature within the lens of our taxonomies, outline relevant design choices of configuration approaches, contrast methods and problem variants against each other, and describe the state of AC in industry. Finally, our review provides researchers and practitioners with a look at future research directions in the field of AC.

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[369]

K. Rath, D. Rügamer, B. Bischl, U. von Toussaint, C. Rea, A. Maris, R. Granetz and C. G. Albert.
Data augmentation for disruption prediction via robust surrogate models.
Journal of Plasma Physics 88.5 (Oct. 2022). DOI

Abstract

The goal of this work is to generate large statistically representative data sets to train machine learning models for disruption prediction provided by data from few existing discharges. Such a comprehensive training database is important to achieve satisfying and reliable prediction results in artificial neural network classifiers. Here, we aim for a robust augmentation of the training database for multivariate time series data using Student $t$ process regression. We apply Student $t$ process regression in a state space formulation via Bayesian filtering to tackle challenges imposed by outliers and noise in the training data set and to reduce the computational complexity. Thus, the method can also be used if the time resolution is high. We use an uncorrelated model for each dimension and impose correlations afterwards via colouring transformations. We demonstrate the efficacy of our approach on plasma diagnostics data of three different disruption classes from the DIII-D tokamak. To evaluate if the distribution of the generated data is similar to the training data, we additionally perform statistical analyses using methods from time series analysis, descriptive statistics and classic machine learning clustering algorithms.

MCML Authors

Katharina Röck (née Rath)

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Statistical Learning and Data Science

[368]

I. Obadic, R. Roscher, D. A. B. Oliveira and X. Zhu.
Exploring Self-Attention for Crop-type Classification Explainability.
Preprint (Oct. 2022). arXiv

Abstract

Automated crop-type classification using Sentinel-2 satellite time series is essential to support agriculture monitoring. Recently, deep learning models based on transformer encoders became a promising approach for crop-type classification. Using explainable machine learning to reveal the inner workings of these models is an important step towards improving stakeholders’ trust and efficient agriculture monitoring. In this paper, we introduce a novel explainability framework that aims to shed a light on the essential crop disambiguation patterns learned by a state-of-the-art transformer encoder model. More specifically, we process the attention weights of a trained transformer encoder to reveal the critical dates for crop disambiguation and use domain knowledge to uncover the phenological events that support the model performance. We also present a sensitivity analysis approach to understand better the attention capability for revealing crop-specific phenological events. We report compelling results showing that attention patterns strongly relate to key dates, and consequently, to the critical phenological events for crop-type classification. These findings might be relevant for improving stakeholder trust and optimizing agriculture monitoring processes. Additionally, our sensitivity analysis demonstrates the limitation of attention weights for identifying the important events in the crop phenology as we empirically show that the unveiled phenological events depend on the other crops in the data considered during training.

MCML Authors

Ivica Obadic

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[367]

M. Windl and S. Mayer.
The Skewed Privacy Concerns of Bystanders in Smart Environments.
MobileHCI 2022 - ACM International Conference on Mobile Human-Computer Interaction. Vancouver, Canada, Sep 28-Oct 01, 2022. DOI

Abstract

As ubiquitous computing brings sensors and actuators directly into our homes, they introduce privacy concerns for the owners and bystanders. However, privacy concerns may vary among devices and depend on the bystanders’ social relation to the owner. In this work, we hypothesize 1) that bystanders assign more privacy concerns to smart home devices than personal computing devices, such as smartphones, even though they have the same capabilities, and 2) that a stronger social relationship mitigates some of the bystanders’ privacy concerns. By conducting an online survey (n=170), we found that personal computing devices are perceived as significantly less privacy-concerning than smart home devices while having equal capabilities. By varying the assumed social relationship, we further found that a stronger connection to the owner reduces privacy concerns. Thus, as bystanders underestimate the risk of personal computing devices and are generally concerned about smart home devices, it is essential to alert the user about the presence of both. We argue that bystanders have to be informed about the privacy risks while entering a new space, in the best case, already in the entrance area.

MCML Authors

Maximiliane Windl

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

* Former Member

[366]

L. Bothmann, S. Strickroth, G. Casalicchio, D. Rügamer, M. Lindauer, F. Scheipl and B. Bischl.
Developing Open Source Educational Resources for Machine Learning and Data Science.
ECML-PKDD 2022 - 3rd Teaching Machine Learning and Artificial Intelligence Workshop at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. URL

Abstract

Education should not be a privilege but a common good. It should be openly accessible to everyone, with as few barriers as possible; even more so for key technologies such as Machine Learning (ML) and Data Science (DS). Open Educational Resources (OER) are a crucial factor for greater educational equity. In this paper, we describe the specific requirements for OER in ML and DS and argue that it is especially important for these fields to make source files publicly available, leading to Open Source Educational Resources (OSER). We present our view on the collaborative development of OSER, the challenges this poses, and first steps towards their solutions. We outline how OSER can be used for blended learning scenarios and share our experiences in university education. Finally, we discuss additional challenges such as credit assignment or granting certificates.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Fabian Scheipl

PD Dr.

Functional Data Analysis

Bernd Bischl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Statistical Learning and Data Science

[365]

C. M. M. Frey, Y. Ma and M. Schubert.
SEA: Graph Shell Attention in Graph Neural Networks.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

A common problem in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes’ representations of the input graph align and become indiscernible. The latest models employing attention mechanisms with Graph Transformer Layers (GTLs) are still restricted to the layer-wise computational workflow of a GNN that are not beyond preventing such effects. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes’ representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph textbf{S}htextbf{e}ll textbf{A}ttention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node’s representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results while drastically reducing the number of parameters compared to state-of-the-art models.

MCML Authors

Christian Frey

Dr.

* Former Member

Yunpu Ma

Dr.

Database Systems and Data Mining

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[364]

D. Rügamer, A. Bender, S. Wiegrebe, D. Racek, B. Bischl, C. L. Müller and C. Stachl.
Factorized Structured Regression for Large-Scale Varying Coefficient Models.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structured regression models, including time-aware varying coefficients models, are, however, limited in their applicability to categorical effects and inclusion of a large number of interactions. Here, we propose Factorized Structured Regression (FaStR) for scalable varying coefficient models. FaStR overcomes limitations of general regression models for large-scale data by combining structured additive regression and factorization approaches in a neural network-based model implementation. This fusion provides a scalable framework for the estimation of statistical models in previously infeasible data settings. Empirical results confirm that the estimation of varying coefficients of our approach is on par with state-of-the-art regression techniques, while scaling notably better and also being competitive with other time-aware RS in terms of prediction performance. We illustrate FaStR’s performance and interpretability on a large-scale behavioral study with smartphone user data.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Christian Müller

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Biomedical Statistics and Data Science

[363]

N. Strauß, D. Winkel, M. Berrendorf and M. Schubert.
Reinforcement Learning for Multi-Agent Stochastic Resource Collection.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

Stochastic Resource Collection (SRC) describes tasks where an agent tries to collect a maximal amount of dynamic resources while navigating through a road network. An instance of SRC is the traveling officer problem (TOP), where a parking officer tries to maximize the number of fined parking violations. In contrast to vehicular routing problems, in SRC tasks, resources might appear and disappear by an unknown stochastic process, and thus, the task is inherently more dynamic. In most applications of SRC, such as TOP, covering realistic scenarios requires more than one agent. However, directly applying multi-agent approaches to SRC yields challenges considering temporal abstractions and inter-agent coordination. In this paper, we propose a novel multi-agent reinforcement learning method for the task of Multi-Agent Stochastic Resource Collection (MASRC). To this end, we formalize MASRC as a Semi-Markov Game which allows the use of temporal abstraction and asynchronous actions by various agents. In addition, we propose a novel architecture trained with independent learning, which integrates the information about collaborating agents and allows us to take advantage of temporal abstractions. Our agents are evaluated on the multiple traveling officer problem, an instance of MASRC where multiple officers try to maximize the number of fined parking violations. Our simulation environment is based on real-world sensor data. Results demonstrate that our proposed agent can beat various state-of-the-art approaches.

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

David Winkel

Database Systems and Data Mining

Max Berrendorf

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[362]

D. Winkel, N. Strauß, M. Schubert and T. Seidl.
Risk-Aware Reinforcement Learning for Multi-Period Portfolio Selection.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

The task of portfolio management is the selection of portfolio allocations for every single time step during an investment period while adjusting the risk-return profile of the portfolio to the investor’s individual level of risk preference. In practice, it can be hard for an investor to quantify his individual risk preference. As an alternative, approximating the risk-return Pareto front allows for the comparison of different optimized portfolio allocations and hence for the selection of the most suitable risk level. Furthermore, an approximation of the Pareto front allows the analysis of the overall risk sensitivity of various investment policies. In this paper, we propose a deep reinforcement learning (RL) based approach, in which a single meta agent generates optimized portfolio allocation policies for any level of risk preference in a given interval. Our method is more efficient than previous approaches, as it only requires training of a single agent for the full approximate risk-return Pareto front. Additionally, it is more stable in training and only requires per time step market risk estimations independent of the policy. Such risk control per time step is a common regulatory requirement for e.g., insurance companies. We benchmark our meta agent against other state-of-the-art risk-aware RL methods using a realistic environment based on real-world Nasdaq-100 data. Our evaluation shows that the proposed meta agent outperforms various benchmark approaches by generating strategies with better risk-return profiles.

MCML Authors

David Winkel

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[361]

D. Deng, F. Karl, F. Hutter, B. Bischl and M. Lindauer.
Efficient Automated Deep Learning for Time Series Forecasting.
ECML-PKDD 2022 - Workshops at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

Recent years have witnessed tremendously improved efficiency of Automated Machine Learning (AutoML), especially Automated Deep Learning (AutoDL) systems, but recent work focuses on tabular, image, or NLP tasks. So far, little attention has been paid to general AutoDL frameworks for time series forecasting, despite the enormous success in applying different novel architectures to such tasks. In this paper, we propose an efficient approach for the joint optimization of neural architecture and hyperparameters of the entire data processing pipeline for time series forecasting. In contrast to common NAS search spaces, we designed a novel neural architecture search space covering various state-of-the-art architectures, allowing for an efficient macro-search over different DL approaches. To efficiently search in such a large configuration space, we use Bayesian optimization with multi-fidelity optimization. We empirically study several different budget types enabling efficient multi-fidelity optimization on different forecasting datasets. Furthermore, we compared our resulting system, against several established baselines and show that it significantly outperforms all of them across several datasets.

MCML Authors

Florian Karl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[360]

S. Gilhuber, M. Berrendorf, Y. Ma and T. Seidl.
Accelerating Diversity Sampling for Deep Active Learning By Low-Dimensional Representations.
IAL @ECML-PKDD 2022 - 6th International Workshop on Interactive Adaptive Learning at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-23, 2022. PDF GitHub

Abstract

Selecting diverse instances for annotation is one of the key factors of successful active learning strategies. To this end, existing methods often operate on high-dimensional latent representations. In this work, we propose to use the low-dimensional vector of predicted probabilities instead, which can be seamlessly integrated into existing methods. We empirically demonstrate that this considerably decreases the query time, i.e., time to select an instance for annotation, while at the same time improving results. Low query times are relevant for active learning researchers, which use a (fast) oracle for simulated annotation and thus are often constrained by query time. It is also practically relevant when dealing with complex annotation tasks for which only a small pool of skilled domain experts is available for annotation with a limited time budget.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Max Berrendorf

Dr.

* Former Member

Yunpu Ma

Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[359]

A. Farshad, A. Makarevich, V. Belagiannis and N. Navab.
MetaMedSeg: Volumetric Meta-learning for Few-Shot Organ Segmentation.
DART @MICCAI 2022 - 4th Workshop on Domain Adaptation and Representation Transfer at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI GitHub

Abstract

The lack of sufficient annotated image data is a common issue in medical image segmentation. For some organs and densities, the annotation may be scarce, leading to poor model training convergence, while other organs have plenty of annotated data. In this work, we present MetaMedSeg, a gradient-based meta-learning algorithm that redefines the meta-learning task for the volumetric medical data with the goal of capturing the variety between the slices. We also explore different weighting schemes for gradients aggregation, arguing that different tasks might have different complexity and hence, contribute differently to the initialization. We propose an importance-aware weighting scheme to train our model. In the experiments, we evaluate our method on the medical decathlon dataset by extracting 2D slices from CT and MRI volumes of different organs and performing semantic segmentation. The results show that our proposed volumetric task definition leads to up to improvement in terms of IoU compared to related baselines. The proposed update rule is also shown to improve the performance for complex scenarios where the data distribution of the target organ is very different from the source organs.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[358]

Y. Yeganeh, A. Farshad, J. Boschmann, R. Gaus, M. Frantzen and N. Navab.
FedAP: Adaptive Personalization in Federated Learning for Non-IID Data.
DeCaF FAIR @MICCAI 2022 - 3rd Workshop on Distributed, Collaborative, and Federated Learning, and Affordable AI and Healthcare for Resource Diverse Global Health at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI

Abstract

Federated learning (FL) is a distributed learning method that offers medical institutes the prospect of collaboration in a global model while preserving the privacy of their patients. Although most medical centers conduct similar medical imaging tasks, their differences, such as specializations, number of patients, and devices, lead to distinctive data distributions. Data heterogeneity poses a challenge for FL and the personalization of the local models. In this work, we investigate an adaptive hierarchical clustering method for FL to produce intermediate semi-global models, so clients with similar data distribution have the chance of forming a more specialized model. Our method forms several clusters consisting of clients with the most similar data distributions; then, each cluster continues to train separately. Inside the cluster, we use meta-learning to improve the personalization of the participants’ models. We compare the clustering approach with classical FedAvg and centralized training by evaluating our proposed methods on the HAM10k dataset for skin lesion classification with extreme heterogeneous data distribution. Our experiments demonstrate significant performance gain in heterogeneous distribution compared to standard FL methods in classification accuracy. Moreover, we show that the models converge faster if applied in clusters and outperform centralized training while using only a small subset of data.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[357]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Implicit Embeddings via GAN Inversion for High Resolution Chest Radiographs.
MAD @MICCAI 2022 - 1st Workshop on Medical Applications with Disentanglements at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI

Abstract

Generative models allow for the creation of highly realistic artificial samples, opening up promising applications in medical imaging. In this work, we propose a multi-stage encoder-based approach to invert the generator of a generative adversarial network (GAN) for high resolution chest radiographs. This gives direct access to its implicitly formed latent space, makes generative models more accessible to researchers, and enables to apply generative techniques to actual patient’s images. We investigate various applications for this embedding, including image compression, disentanglement in the encoded dataset, guided image manipulation, and creation of stylized samples. We find that this type of GAN inversion is a promising research direction in the domain of chest radiograph modeling and opens up new ways to combine realistic X-ray sample synthesis with radiological image analysis.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[356]

A. Farshad, Y. Yeganeh, P. Gehlbach and N. Navab.
Y-Net: A Spatiospectral Dual-Encoder Network for Medical Image Segmentation.
MICCAI 2022 - 25th International Conference on Medical Image Computing and Computer Assisted Intervention. Singapore, Sep 18-22, 2022. DOI GitHub

Abstract

Automated segmentation of retinal optical coherence tomography (OCT) images has become an important recent direction in machine learning for medical applications. We hypothesize that the anatomic structure of layers and their high-frequency variation in OCT images make retinal OCT a fitting choice for extracting spectral domain features and combining them with spatial domain features. In this work, we present Y-Net, an architecture that combines the frequency domain features with the image domain to improve the segmentation performance of OCT images. The results of this work demonstrate that the introduction of two branches, one for spectral and one for spatial domain features, brings very significant improvement in fluid segmentation performance and allows outperformance as compared to the well-known U-Net model. Our improvement was 13% on the fluid segmentation dice score and 1.9% on the average dice score. Finally, removing selected frequency ranges in the spectral domain demonstrates the impact of these features on the fluid segmentation outperformance.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[355]

P. Engstler, M. Keicher, D. Schinz, K. Mach, A. S. Gersing, S. C. Foreman, S. S. Goller, J. Weissinger, J. Rischewski, A.-S. Dietrich, B. Wiestler, J. S. Kirschke, A. Khakzar and N. Navab.
Interpretable Vertebral Fracture Diagnosis.
iMIMIC @MICCAI 2022 - Workshop on Interpretability of Machine Intelligence in Medical Image Computing at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI GitHub

Abstract

Do black-box neural network models learn clinically relevant features for fracture diagnosis? The answer not only establishes reliability, quenches scientific curiosity, but also leads to explainable and verbose findings that can assist the radiologists in the final and increase trust. This work identifies the concepts networks use for vertebral fracture diagnosis in CT images. This is achieved by associating concepts to neurons highly correlated with a specific diagnosis in the dataset. The concepts are either associated with neurons by radiologists pre-hoc or are visualized during a specific prediction and left for the user’s interpretation. We evaluate which concepts lead to correct diagnosis and which concepts lead to false positives. The proposed frameworks and analysis pave the way for reliable and explainable vertebral fracture diagnosis.

MCML Authors

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Computer Aided Medical Procedures & Augmented Reality

[354]

E. Hohma, C. M. M. Frey, A. Beer and T. Seidl.
SCAR - Spectral Clustering Accelerated and Robustified.
VLDB 2022 - 48th International Conference on Very Large Databases. Sydney, Australia (and hybrid), Sep 05-09, 2022. DOI GitHub

Abstract

Spectral clustering is one of the most advantageous clustering approaches. However, standard Spectral Clustering is sensitive to noisy input data and has a high runtime complexity. Tackling one of these problems often exacerbates the other. As real-world datasets are often large and compromised by noise, we need to improve both robustness and runtime at once. Thus, we propose Spectral Clustering - Accelerated and Robust (SCAR), an accelerated, robustified spectral clustering method. In an iterative approach, we achieve robustness by separating the data into two latent components: cleansed and noisy data. We accelerate the eigendecomposition - the most time-consuming step - based on the Nyström method. We compare SCAR to related recent state-of-the-art algorithms in extensive experiments. SCAR surpasses its competitors in terms of speed and clustering quality on highly noisy data.

MCML Authors

Christian Frey

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[353]

R. Sonabend, A. Bender and S. Vollmer.
Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures.
Bioinformatics 38.17 (Sep. 2022). DOI GitHub

Abstract

Motivation: In this article, we consider how to evaluate survival distribution predictions with measures of discrimination. This is non-trivial as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages.
Results: Whilst distributions are frequently evaluated by discrimination measures, we find that the method for doing so is rarely described in the literature and often leads to unfair comparisons or ‘C-hacking’. We demonstrate by example how simple it can be to manipulate results and use this to argue for better reporting guidelines and transparency in the literature. We recommend that machine learning survival analysis software implements clear transformations between distribution and risk predictions in order to allow more transparent and accessible model evaluation.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[352]

G. Brasó, O. Cetintas and L. Leal-Taixé.
Multi-Object Tracking and Segmentation Via Neural Message Passing.
International Journal of Computer Vision 130.12 (Sep. 2022). DOI GitHub

Abstract

Graphs offer a natural way to formulate Multiple Object Tracking (MOT) and Multiple Object Tracking and Segmentation (MOTS) within the tracking-by-detection paradigm. However, they also introduce a major challenge for learning methods, as defining a model that can operate on such structured domain is not trivial. In this work, we exploit the classical network flow formulation of MOT to define a fully differentiable framework based on Message Passing Networks. By operating directly on the graph domain, our method can reason globally over an entire set of detections and exploit contextual features. It then jointly predicts both final solutions for the data association problem and segmentation masks for all objects in the scene while exploiting synergies between the two tasks. We achieve state-of-the-art results for both tracking and segmentation in several publicly available datasets.

MCML Authors

Guillem Brasó

* Former Member

Laura Leal-Taixé

Prof. Dr.

* Former Principal Investigator

[351]

B. A. Hersbach, D. S. Fischer, G. Masserdotti, Deeksha, K. Mojžišová, T. Waltzhöni, D. Rodriguez‐Terrones, M. Heinig, F. J. Theis, M. Götz and S. H. Stricker.
Probing cell identity hierarchies by fate titration and collision during direct reprogramming.
Molecular Systems Biology 18.e11129 (Sep. 2022). DOI

Abstract

Despite the therapeutic promise of direct reprogramming, basic principles concerning fate erasure and the mechanisms to resolve cell identity conflicts remain unclear. To tackle these fundamental questions, we established a single‐cell protocol for the simultaneous analysis of multiple cell fate conversion events based on combinatorial and traceable reprogramming factor expression: Collide‐seq. Collide‐seq revealed the lack of a common mechanism through which fibroblast‐specific gene expression loss is initiated. Moreover, we found that the transcriptome of converting cells abruptly changes when a critical level of each reprogramming factor is attained, with higher or lower levels not contributing to major changes. By simultaneously inducing multiple competing reprogramming factors, we also found a deterministic system, in which titration of fates against each other yields dominant or colliding fates. By investigating one collision in detail, we show that reprogramming factors can disturb cell identity programs independent of their ability to bind their target genes. Taken together, Collide‐seq has shed light on several fundamental principles of fate conversion that may aid in improving current reprogramming paradigms.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Mathematical Modelling of Biological Systems

[350]

C. Fritz, M. Mehrl, P. W. Thurner and G. Kauermann.
All that Glitters is not Gold: Relational Events Models with Spurious Events.
Network Science 11.2 (Sep. 2022). DOI

Abstract

As relational event models are an increasingly popular model for studying relational structures, the reliability of large-scale event data collection becomes more and more important. Automated or human-coded events often suffer from non-negligible false-discovery rates in event identification. And most sensor data are primarily based on actors’ spatial proximity for predefined time windows; hence, the observed events could relate either to a social relationship or random co-location. Both examples imply spurious events that may bias estimates and inference. We propose the Relational Event Model for Spurious Events (REMSE), an extension to existing approaches for interaction data. The model provides a flexible solution for modeling data while controlling for spurious events. Estimation of our model is carried out in an empirical Bayesian approach via data augmentation. Based on a simulation study, we investigate the properties of the estimation procedure. To demonstrate its usefulness in two distinct applications, we employ this model to combat events from the Syrian civil war and student co-location data. Results from the simulation and the applications identify the REMSE as a suitable approach to modeling relational event data in the presence of spurious events.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[349]

W. Ghada, E. Casellas, J. Herbinger, A. Garcia-Benadí, L. Bothmann, N. Estrella, J. Bech and A. Menzel.
Stratiform and Convective Rain Classification Using Machine Learning Models and Micro Rain Radar.
Remote Sensing 14.18 (Sep. 2022). DOI

Abstract

Rain type classification into convective and stratiform is an essential step required to improve quantitative precipitation estimations by remote sensing instruments. Previous studies with Micro Rain Radar (MRR) measurements and subjective rules have been performed to classify rain events. However, automating this process by using machine learning (ML) models provides the advantages of fast and reliable classification with the possibility to classify rain minute by minute. A total of 20,979 min of rain data measured by an MRR at Das in northeast Spain were used to build seven types of ML models for stratiform and convective rain type classification. The proposed classification models use a set of 22 parameters that summarize the reflectivity, the Doppler velocity, and the spectral width (SW) above and below the so-called separation level (SL). This level is defined as the level with the highest increase in Doppler velocity and corresponds with the bright band in stratiform rain. A pre-classification of the rain type for each minute based on the rain microstructure provided by the collocated disdrometer was performed. Our results indicate that complex ML models, particularly tree-based ensembles such as xgboost and random forest which capture the interactions of different features, perform better than simpler models. Applying methods from the field of interpretable ML, we identified reflectivity at the lowest layer and the average spectral width in the layers below SL as the most important features. High reflectivity and low SW values indicate a higher probability of convective rain.

MCML Authors

Julia Herbinger

Dr.

* Former Member

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[348]

E. Dorigatti, B. Bischl and B. Schubert.
Improved proteasomal cleavage prediction with positive-unlabeled learning.
Preprint (Sep. 2022). arXiv

Abstract

Accurate in silico modeling of the antigen processing pathway is crucial to enable personalized epitope vaccine design for cancer. An important step of such pathway is the degradation of the vaccine into smaller peptides by the proteasome, some of which are going to be presented to T cells by the MHC complex. While predicting MHC-peptide presentation has received a lot of attention recently, proteasomal cleavage prediction remains a relatively unexplored area in light of recent advancesin high-throughput mass spectrometry-based MHC ligandomics. Moreover, as such experimental techniques do not allow to identify regions that cannot be cleaved, the latest predictors generate decoy negative samples and treat them as true negatives when training, even though some of them could actually be positives. In this work, we thus present a new predictor trained with an expanded dataset and the solid theoretical underpinning of positive-unlabeled learning, achieving a new state-of-the-art in proteasomal cleavage prediction. The improved predictive capabilities will in turn enable more precise vaccine development improving the efficacy of epitope-based vaccines. Pretrained models are available on GitHub.

MCML Authors

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[347]

E. Dorigatti, J. Schweisthal, B. Bischl and M. Rezaei.
Robust and Efficient Imbalanced Positive-Unlabeled Learning with Self-supervision.
Preprint (Sep. 2022). arXiv GitHub

Abstract

Learning from positive and unlabeled (PU) data is a setting where the learner only has access to positive and unlabeled samples while having no information on negative examples. Such PU setting is of great importance in various tasks such as medical diagnosis, social network analysis, financial markets analysis, and knowledge base completion, which also tend to be intrinsically imbalanced, i.e., where most examples are actually negatives. Most existing approaches for PU learning, however, only consider artificially balanced datasets and it is unclear how well they perform in the realistic scenario of imbalanced and long-tail data distribution. This paper proposes to tackle this challenge via robust and efficient self-supervised pretraining. However, training conventional self-supervised learning methods when applied with highly imbalanced PU distribution needs better reformulation. In this paper, we present textit{ImPULSeS}, a unified representation learning framework for underline{Im}balanced underline{P}ositive underline{U}nlabeled underline{L}earning leveraging underline{Se}lf-underline{S}upervised debiase pre-training. ImPULSeS uses a generic combination of large-scale unsupervised learning with debiased contrastive loss and additional reweighted PU loss. We performed different experiments across multiple datasets to show that ImPULSeS is able to halve the error rate of the previous state-of-the-art, even compared with previous methods that are given the true prior. Moreover, our method showed increased robustness to prior misspecification and superior performance even when pretraining was performed on an unrelated dataset. We anticipate such robustness and efficiency will make it much easier for practitioners to obtain excellent results on other PU datasets of interest.

MCML Authors

Emilio Dorigatti

Dr.

* Former Member

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[346]

S. Laina, S. Boche, S. Papatheodorou, S. Schaefer, J. Jung and S. Leutenegger.
FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment.
Preprint (Sep. 2022). arXiv

Abstract

Geometrically accurate and semantically expressive map representations have proven invaluable to facilitate robust and safe mobile robot navigation and task planning. Nevertheless, real-time, open-vocabulary semantic understanding of large-scale unknown environments is still an open problem. In this paper we present FindAnything, an open-world mapping and exploration framework that incorporates vision-language information into dense volumetric submaps. Thanks to the use of vision-language features, FindAnything bridges the gap between pure geometric and open-vocabulary semantic information for a higher level of understanding while allowing to explore any environment without the help of any external source of ground-truth pose information. We represent the environment as a series of volumetric occupancy submaps, resulting in a robust and accurate map representation that deforms upon pose updates when the underlying SLAM system corrects its drift, allowing for a locally consistent representation between submaps. Pixel-wise vision-language features are aggregated from efficient SAM (eSAM)-generated segments, which are in turn integrated into object-centric volumetric submaps, providing a mapping from open-vocabulary queries to 3D geometry that is scalable also in terms of memory usage. The open-vocabulary map representation of FindAnything achieves state-of-the-art semantic accuracy in closed-set evaluations on the Replica dataset. This level of scene understanding allows a robot to explore environments based on objects or areas of interest selected via natural language queries. Our system is the first of its kind to be deployed on resource-constrained devices, such as MAVs, leveraging vision-language information for real-world robotic tasks.

MCML Authors

Simon Schaefer

B3 | Multimodal Perception
→ Group Stefan Leutenegger

Machine Learning for Robotics

Stefan Leutenegger

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Machine Learning for Robotics

[345]

P. Lin, J. Wang, H. Schütze and W. Li.
Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation.
Preprint (Sep. 2022). arXiv

Abstract

The task of empathetic response generation aims to understand what feelings a speaker expresses on his/her experiences and then reply to the speaker appropriately. To solve the task, it is essential to model the content-emotion duality of a dialogue, which is composed of the content view (i.e., what personal experiences are described) and the emotion view (i.e., the feelings of the speaker on these experiences). To this end, we design a framework to model the Content-Emotion Duality (CEDual) via disentanglement for empathetic response generation. With disentanglement, we encode the dialogue history from both the content and emotion views, and then generate the empathetic response based on the disentangled representations, thereby both the content and emotion information of the dialogue history can be embedded in the generated response. The experiments on the benchmark dataset EMPATHETICDIALOGUES show that the CEDual model achieves state-of-the-art performance on both automatic and human metrics, and it also generates more empathetic responses than previous methods.

MCML Authors

Peiqin Lin

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Computational Linguistics

[344]

C. A. Scholbeck, H. Funk and G. Casalicchio.
Algorithm-Agnostic Interpretations for Clustering.
Preprint (Sep. 2022). arXiv

Abstract

A clustering outcome for high-dimensional data is typically interpreted via post-processing, involving dimension reduction and subsequent visualization. This destroys the meaning of the data and obfuscates interpretations. We propose algorithm-agnostic interpretation methods to explain clustering outcomes in reduced dimensions while preserving the integrity of the data. The permutation feature importance for clustering represents a general framework based on shuffling feature values and measuring changes in cluster assignments through custom score functions. The individual conditional expectation for clustering indicates observation-wise changes in the cluster assignment due to changes in the data. The partial dependence for clustering evaluates average changes in cluster assignments for the entire feature space. All methods can be used with any clustering algorithm able to reassign instances through soft or hard labels. In contrast to common post-processing methods such as principal component analysis, the introduced methods maintain the original structure of the features.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[343]

S.-F. Zheng, J. Nam, E. Dorigatti, B. Bischl, S. Azizi and M. Rezaei.
Joint Debiased Representation and Image Clustering Learning with Self-Supervision.
Preprint (Sep. 2022). arXiv GitHub

Abstract

Contrastive learning is among the most successful methods for visual representation learning, and its performance can be further improved by jointly performing clustering on the learned representations. However, existing methods for joint clustering and contrastive learning do not perform well on long-tailed data distributions, as majority classes overwhelm and distort the loss of minority classes, thus preventing meaningful representations to be learned. Motivated by this, we develop a novel joint clustering and contrastive learning framework by adapting the debiased contrastive loss to avoid under-clustering minority classes of imbalanced datasets. We show that our proposed modified debiased contrastive loss and divergence clustering loss improves the performance across multiple datasets and learning tasks.

MCML Authors

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[342]

F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Representation Learning for Tablet and Paper Domain Adaptation in favor of Online Handwriting Recognition.
MPRSS @ICPR 2022 - 7th International Workshop on Multimodal pattern recognition of social signals in human computer interaction at the 26th International Conference on Pattern Recognition (ICPR 2022). Montreal, Canada, Aug 21-25, 2022. arXiv

Abstract

The performance of a machine learning model degrades when it is applied to data from a similar but different domain than the data it has initially been trained on. The goal of domain adaptation (DA) is to mitigate this domain shift problem by searching for an optimal feature transformation to learn a domain-invariant representation. Such a domain shift can appear in handwriting recognition (HWR) applications where the motion pattern of the hand and with that the motion pattern of the pen is different for writing on paper and on tablet. This becomes visible in the sensor data for online handwriting (OnHW) from pens with integrated inertial measurement units. This paper proposes a supervised DA approach to enhance learning for OnHW recognition between tablet and paper data. Our method exploits loss functions such as maximum mean discrepancy and correlation alignment to learn a domain-invariant feature representation (i.e., similar covariances between tablet and paper features). We use a triplet loss that takes negative samples of the auxiliary domain (i.e., paper samples) to increase the amount of samples of the tablet dataset. We conduct an evaluation on novel sequence-based OnHW datasets (i.e., words) and show an improvement on the paper domain with an early fusion strategy by using pairwise learning.

MCML Authors

Felix Ott

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[341]

C. Leiber, L. G. M. Bauer, M. Neumayr, C. Plant and C. Böhm.
The DipEncoder: Enforcing Multimodality in Autoencoders.
KDD 2022 - 28th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA, Aug 14-18, 2022. DOI

Abstract

Hartigan’s Dip-test of unimodality gained increasing interest in unsupervised learning over the past few years. It is free from complex parameterization and does not require a distribution assumed a priori. A useful property is that the resulting Dip-values can be derived to find a projection axis that identifies multimodal structures in the data set. In this paper, we show how to apply the gradient not only with respect to the projection axis but also with respect to the data to improve the cluster structure. By tightly coupling the Dip-test with an autoencoder, we obtain an embedding that clearly separates all clusters in the data set. This method, called DipEncoder, is the basis of a novel deep clustering algorithm. Extensive experiments show that the DipEncoder is highly competitive to state-of-the-art methods.

MCML Authors

Collin Leiber

* Former Member

Christian Böhm

Prof. Dr.

* Former Member

[340]

E. M. A. Slob, A. Faiz, J. van Nijnatten, S. J. H. Vijverberg, C. Longo, M. Kutlu, F. T. Chew, Y. Y. Sio, E. Herrera-Luis, A. Espuela-Ortiz, J. Perez-Garcia, M. Pino-Yanes, E. G. Burchard, U. Potočnik, M. Gorenjak, C. Palmer, C. Maroteau, S. Turner, K. Verhamme, L. Karimi, S. Mukhopadhyay, W. Timens, P. S. Hiemstra, M. W. Pijnenburg, M. Neighbors, M. A. Grimbaldeston, G. W. Tew, C. A. Brandsma, V. Berce, H. Aliee, F. J. Theis, D. D. Sin, X. Li, M. van den Berge, A. H. Zee and G. H. Koppelman.
Association of bronchial steroid inducible methylation quantitative trait loci with asthma and chronic obstructive pulmonary disease treatment response.
Clinical and Translational Allergy 12.8 (Aug. 2022). DOI

Abstract

Large variation in response to inhaled corticosteroids (ICS) has been reported in both asthma and chronic obstructive pulmonary disease (COPD), which may partly be explained by genetic factors. The transcriptome of the airways changes following ICS treatment,1 which may be directed by single nucleotide polymorphisms (SNPs), that affect deoxyribonucleic acid (DNA) methylation (methylation-Quantitative Trait Loci, meQTL).
A strong and consistent response of the airways to ICS in both asthma and COPD patients1, 2 has been found, and severe childhood asthma has been associated with increased odds of COPD development in later life,3 showing that overlap between the diseases may exist. We hypothesised that preselection of steroid-inducible meQTL that affect DNA methylation upon ICS treatment may increase power to find SNPs that also clinically affect response to ICS and that these genetic variants might overlap between asthma and COPD. The aim of this study was to identify SNPs that affect change in DNA methylation in the airway wall upon ICS treatment, and to investigate whether these SNPs are associated with asthma exacerbations in children despite treatment with ICS.
For the identification of meQTLs, we investigated 43 Dutch COPD patients from the Groningen and Leiden Universities study of Corticosteroids in Obstructive Lung Disease (GLUCOLD) study (Table S1).1 Longitudinal airway wall DNA methylation (EPIC 850 K array) and gene expression (ribonucleic acid-sequencing, RNA-seq) was collected from these patients pre- and post-6 months of fluticasone ± salmeterol (500/50 μg twice daily) treatment (Figure S1). We focused on methylation sites that previously were shown to be altered during ICS treatment (1049 CpG sites).4 This analysis identified 76 inducible meQTL caused by 71 independent SNPs with an false discovery rate (FDR) < 0.05 (Table S2). The most significant association was between cg13086983 and rs10917023, where the G allele (minor allele frequency: 7.7%) induced higher methylation (Beta: 0.849, p value: 4.21 × 10−06). Of these 76 CpG sites, 24 were associated with 24 gene transcripts (Table S3). The most significant association was found between the Cytosine-phosphate-Guanine (CpG) site cg08570199 and the CCDC80 gene (Beta coefficient: −1.249, p-value: 2.05 × 10−4; Figure 1A–D).

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[339]

M. van Smeden, G. Heinze, B. Van Calster, F. W. Asselbergs, P. E. Vardas, N. Bruining, P. de Jaegere, J. H. Moore, S. Denaxas, A.-L. Boulesteix and K. G. M. Moons.
Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease.
European Heart Journal 43.31 (Aug. 2022). DOI

Abstract

The medical field has seen a rapid increase in the development of artificial intelligence (AI)-based prediction models. With the introduction of such AI-based prediction model tools and software in cardiovascular patient care, the cardiovascular researcher and healthcare professional are challenged to understand the opportunities as well as the limitations of the AI-based predictions. In this article, we present 12 critical questions for cardiovascular health professionals to ask when confronted with an AI-based prediction model. We aim to support medical professionals to distinguish the AI-based prediction models that can add value to patient care from the AI that does not.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[338]

M. Lotfollahi, M. Naghipourfar, M. D. Luecken, M. Khajavi, M. Büttner, M. Wagenstetter, Z. Avsec, A. Gayoso, N. Yosef, M. Interlandi, S. Rybakov, A. V. Misharin and F. J. Theis.
Mapping single-cell data to reference atlases by transfer learning.
Nature Biotechnology 40 (Aug. 2022). DOI

Abstract

Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[337]

M. Schneble and G. Kauermann.
Estimation of Latent Network Flows in Bike-Sharing Systems.
Statistical Modelling 22.2 (Aug. 2022). DOI

Abstract

Estimation of latent network flows is a common problem in statistical network analysis. The typical setting is that we know the margins of the network, that is, in- and outdegrees, but the flows are unobserved. In this article, we develop a mixed regression model to estimate network flows in a bike-sharing network if only the hourly differences of in- and outdegrees at bike stations are known. We also include exogenous covariates such as weather conditions. Two different parameterizations of the model are considered to estimate (a) the whole network flow and (b) the network margins only. The estimation of the model parameters is proposed via an iterative penalized maximum likelihood approach. This is exemplified by modelling network flows in the Vienna bike-sharing system. In order to evaluate our modelling approach, we conduct our analyses exploiting different distributional assumptions while we also respect the provider’s interventions appropriately for keeping the estimation error low. Furthermore, a simulation study is conducted to show the performance of the model. For practical purposes, it is crucial to predict when and at which station there is a lack or an excess of bikes. For this application, our model shows to be well suited by providing quite accurate predictions.

MCML Authors

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Applied Statistics in Social Sciences, Economics and Business

[336]

C. Fritz, G. De Nicola, M. Rave, M. Weigert, Y. Khazaei, U. Berger, H. Küchenhoff and G. Kauermann.
Statistical modelling of COVID-19 data: Putting generalized additive models to work.
Statistical Modelling 24.4 (Aug. 2022). DOI

Abstract

Over the course of the COVID-19 pandemic, Generalized Additive Models (GAMs) have been successfully employed on numerous occasions to obtain vital data-driven insights. In this article we further substantiate the success story of GAMs, demonstrating their flexibility by focusing on three relevant pandemic-related issues. First, we examine the interdepency among infections in different age groups, concentrating on school children. In this context, we derive the setting under which parameter estimates are independent of the (unknown) case-detection ratio, which plays an important role in COVID-19 surveillance data. Second, we model the incidence of hospitalizations, for which data is only available with a temporal delay. We illustrate how correcting for this reporting delay through a nowcasting procedure can be naturally incorporated into the GAM framework as an offset term. Third, we propose a multinomial model for the weekly occupancy of intensive care units (ICU), where we distinguish between the number of COVID-19 patients, other patients and vacant beds. With these three examples, we aim to showcase the practical and ‘off-the-shelf’ applicability of GAMs to gain new insights from real-world data.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[335]

Z. Ding, Z. Li, R. Qi, J. Wu, B. He, Y. Ma, Z. Meng, S. Chen, R. Liao, Z. Han and V. Tresp.
Forecasting Question Answering over Temporal Knowledge Graphs.
Preprint (Aug. 2022). arXiv

Abstract

Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. TKGQA requires temporal reasoning techniques to extract the relevant information from temporal knowledge bases. The only existing TKGQA dataset, i.e., CronQuestions, consists of temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning the same period can be fully used for answer inference, allowing the TKGQA models to use even the future knowledge to answer the questions based on the past facts. In real-world scenarios, however, it is also common that given the knowledge until now, we wish the TKGQA systems to answer the questions asking about the future. As humans constantly seek plans for the future, building TKGQA systems for answering such forecasting questions is important. Nevertheless, this has still been unexplored in previous research. In this paper, we propose a novel task: forecasting question answering over temporal knowledge graphs. We also propose a large-scale TKGQA benchmark dataset, i.e., ForecastTKGQuestions, for this task. It includes three types of questions, i.e., entity prediction, yes-no, and fact reasoning questions. For every forecasting question in our dataset, QA models can only have access to the TKG information before the timestamp annotated in the given question for answer inference. We find that the state-of-the-art TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-no questions and fact reasoning questions. To this end, we propose ForecastTKGQA, a TKGQA model that employs a TKG forecasting module for future inference, to answer all three types of questions. Experimental results show that ForecastTKGQA outperforms recent TKGQA methods on the entity prediction questions, and it also shows great effectiveness in answering the other two types of questions.

MCML Authors

Zifeng Ding

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining

Shuo Chen

Database Systems and Data Mining

Ruotong Liao

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[334]

F. Ott, N. L. Raichur, D. Rügamer, T. Feigl, H. Neumann, B. Bischl and C. Mutschler.
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression.
Preprint (Aug. 2022). arXiv

Abstract

Visual-inertial localization is a key problem in computer vision and robotics applications such as virtual reality, self-driving cars, and aerial vehicles. The goal is to estimate an accurate pose of an object when either the environment or the dynamics are known. Absolute pose regression (APR) techniques directly regress the absolute pose from an image input in a known scene using convolutional and spatio-temporal networks. Odometry methods perform relative pose regression (RPR) that predicts the relative pose from a known object dynamic (visual or inertial inputs). The localization task can be improved by retrieving information from both data sources for a cross-modal setup, which is a challenging problem due to contradictory tasks. In this work, we conduct a benchmark to evaluate deep multimodal fusion based on pose graph optimization and attention networks. Auxiliary and Bayesian learning are utilized for the APR task. We show accuracy improvements for the APR-RPR task and for the RPR-RPR task for aerial vehicles and hand-held devices. We conduct experiments on the EuRoC MAV and PennCOSYVIO datasets and record and evaluate a novel industry dataset.

MCML Authors

Felix Ott

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[333]

L. Schneider, L. Schäpermeier, R. Prager, B. Bischl, H. Trautmann and P. Kerschke.
HPO X ELA: Investigating Hyperparameter Optimization Landscapes by Means of Exploratory Landscape Analysis.
Preprint (Aug. 2022). arXiv

Abstract

Hyperparameter optimization (HPO) is a key component of machine learning models for achieving peak predictive performance. While numerous methods and algorithms for HPO have been proposed over the last years, little progress has been made in illuminating and examining the actual structure of these black-box optimization problems. Exploratory landscape analysis (ELA) subsumes a set of techniques that can be used to gain knowledge about properties of unknown optimization problems. In this paper, we evaluate the performance of five different black-box optimizers on 30 HPO problems, which consist of two-, three- and five-dimensional continuous search spaces of the XGBoost learner trained on 10 different data sets. This is contrasted with the performance of the same optimizers evaluated on 360 problem instances from the black-box optimization benchmark (BBOB). We then compute ELA features on the HPO and BBOB problems and examine similarities and differences. A cluster analysis of the HPO and BBOB problems in ELA feature space allows us to identify how the HPO problems compare to the BBOB problems on a structural meta-level. We identify a subset of BBOB problems that are close to the HPO problems in ELA feature space and show that optimizer performance is comparably similar on these two sets of benchmark problems. We highlight open challenges of ELA for HPO and discuss potential directions of future research and applications.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[332]

M. Fromm.
Machine learning driven argument mining.
Dissertation 2022. DOI

Abstract

This thesis addresses the challenges of argumentation in the digital age by applying machine learning methods to automatically identify, retrieve, and evaluate arguments from diverse and often contradictory online sources. The first focus is on argument identification, specifically in heterogeneous text sources and peer reviews, where the relationship between the topic and arguments is crucial, and knowledge transfer across domains is limited. The second focus is on argument retrieval, where machine learning is used to select relevant documents, ensuring comprehensive and non-redundant argument coverage. Finally, the thesis explores the strength or quality of arguments, integrating this concept with other argument mining tasks and evaluating its impact across different text domains and contexts. (Shortened.)

MCML Authors

Michael Fromm

Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

* Former Member

[331]

C. Fritz.
Statistical approaches to dynamic networks in society.
Dissertation 2022. DOI

Abstract

This dissertation focuses on dynamic networks in the Social Sciences, examining methods and applications in network modeling. Part two provides an overview of modeling frameworks for dynamic networks, including applications in studying COVID-19 infections using social connectivity as covariates. In part three, the dissertation introduces a Signed Exponential Random Graph Model (SERGM) for signed networks and a bipartite variant of the Temporal Exponential Random Graph Model (TERGM) to study co-inventorship in patents. Part four concludes with models for event networks, including a Relational Event Model for Spurious Events (REMSE) to manage false-discovery rates in event data. (Shortened).

MCML Authors

Cornelius Fritz

Dr.

* Former Member

[330]

F. Pfisterer, L. Schneider, J. Moosbauer, M. Binder and B. Bischl.
YAHPO Gym - Design Criteria and a new Multifidelity Benchmark for Hyperparameter Optimization.
AutoML @ICML 2022 - 1st International Conference on Automated Machine Learning co-located with the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 25-27, 2022. URL GitHub

Abstract

When developing and analyzing new hyperparameter optimization (HPO) methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we list desirable properties and requirements for such benchmarks and propose a new set of challenging and relevant multifidelity HPO benchmark problems motivated by these requirements. For this, we revisit the concept of surrogate-based benchmarks and empirically compare them to more widely-used tabular benchmarks, showing that the latter ones may induce bias in performance estimation and ranking of HPO methods. We present a new surrogate-based benchmark suite for multifidelity HPO methods consisting of 9 benchmark collections that constitute over 700 multifidelity HPO problems in total. All our benchmarks also allow for querying of multiple optimization targets, enabling the benchmarking of multi-objective HPO. We examine and compare our benchmark suite with respect to the defined requirements and show that our benchmarks provide viable additions to existing suites.

MCML Authors

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[329]

L. Schneider, F. Pfisterer, P. Kent, J. Branke, B. Bischl and J. Thomas.
Tackling neural architecture search with quality diversity optimization.
AutoML @ICML 2022 - 1st International Conference on Automated Machine Learning co-located with the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 25-27, 2022. URL

Abstract

Neural architecture search (NAS) has been studied extensively and has grown to become a research field with substantial impact. While classical single-objective NAS searches for the architecture with the best performance, multi-objective NAS considers multiple objectives that should be optimized simultaneously, e.g., minimizing resource usage along the validation error. Although considerable progress has been made in the field of multi-objective NAS, we argue that there is some discrepancy between the actual optimization problem of practical interest and the optimization problem that multi-objective NAS tries to solve. We resolve this discrepancy by formulating the multi-objective NAS problem as a quality diversity optimization (QDO) problem and introduce three quality diversity NAS optimizers (two of them belonging to the group of multifidelity optimizers), which search for high-performing yet diverse architectures that are optimal for application-specific niches, e.g., hardware constraints. By comparing these optimizers to their multi-objective counterparts, we demonstrate that quality diversity NAS in general outperforms multi-objective NAS with respect to quality of solutions and efficiency. We further show how applications and future NAS research can thrive on QDO.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Janek Thomas

Dr.

* Former Member

[328]

E. Schede, J. Brandt, A. Tornede, M. Wever, V. Bengs, E. Hüllermeier and K. Tierney.
A Survey of Methods for Automated Algorithm Configuration.
IJCAI-ECAI 2022 - 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence. Vienna, Austria, Jul 23-29, 2022. Extended Abstract. DOI

Abstract

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[327]

M. Ali, M. Berrendorf, M. Galkin, V. Thost, T. Ma, V. Tresp and J. Lehmann.
Improving Inductive Link Prediction Using Hyper-Relational Facts (Extended Abstract).
IJCAI-ECAI 2022 - Best paper track at the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence. Vienna, Austria, Jul 23-29, 2022. DOI

Abstract

For many years, link prediction on knowledge graphs (KGs) has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based KGs, whereas their richer counterparts, hyper-relational KGs (e.g., Wikidata), have not yet been properly studied. In this work, we classify different inductive settings and study the benefits of employing hyper-relational KGs on a wide range of semi- and fully inductive link prediction tasks powered by recent advancements in graph neural networks. Our experiments on a novel set of benchmarks show that qualifiers over typed edges can lead to performance improvements of 6% of absolute gains (for the Hits@10 metric) compared to triple-only baselines.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[326]

A. Klaß, S. M. Lorenz, M. W. Lauer-Schmaltz, D. Rügamer, B. Bischl, C. Mutschler and F. Ott.
Uncertainty-aware Evaluation of Time-Series Classification for Online Handwriting Recognition with Domain Shift.
STRL @IJCAI-ECAI 2022 - Workshop on Spatio-Temporal Reasoning and Learning at the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI-ECAI 2022). Vienna, Austria, Jul 23-29, 2022. URL

Abstract

For many applications, analyzing the uncertainty of a machine learning model is indispensable. While research of uncertainty quantification (UQ) techniques is very advanced for computer vision applications, UQ methods for spatio-temporal data are less studied. In this paper, we focus on models for online handwriting recognition, one particular type of spatio-temporal data. The data is observed from a sensor-enhanced pen with the goal to classify written characters. We conduct a broad evaluation of aleatoric (data) and epistemic (model) UQ based on two prominent techniques for Bayesian inference, Stochastic Weight Averaging-Gaussian (SWAG) and Deep Ensembles. Next to a better understanding of the model, UQ techniques can detect out-of-distribution data and domain shifts when combining right-handed and left-handed writers (an underrepresented group).

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Felix Ott

Dr.

* Former Member

[325]

A. Khakzar, Y. Li, Y. Zhang, M. Sanisoglu, S. T. Kim, M. Rezaei, B. Bischl and N. Navab.
Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models.
IMLH @ICML 2022 - 2nd Workshop on Interpretable Machine Learning in Healthcare at the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 17-23, 2022. arXiv

Abstract

One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and the imbalanced problem. However, the impact of these approaches on the learned features is not well studied. In this paper, we look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features. We study several popular cost-sensitive approaches for handling data imbalance and analyze the feature maps of the convolutional neural networks from multiple perspectives: analyzing the alignment of salient features with pathologies and analyzing the pathology-related concepts encoded by the networks. Our study reveals differences and insights regarding the trained models that are not reflected by quantitative metrics such as AUROC and AP and show up only by looking at the models through a lens.

MCML Authors

Ashkan Khakzar

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Yawei Li

Statistical Learning and Data Science

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[324]

L. Hetzel, S. Boehm, N. Kilbertus, S. Günnemann, M. Lotfollahi and F. J. Theis.
Predicting single-cell perturbation responses for unseen drugs.
MLDD @ICML 2022 - Workshop on Machine Learning for Drug Discovery at the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 17-23, 2022. URL

Abstract

Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA-seq HTS is required to enrich single-cell data meaningfully. We introduce a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with a transfer learning scheme and demonstrate how training on existing bulk RNA-seq HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating targeted drug discovery.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[323]

H. Li, Q. Khan, V. Tresp and D. Cremers.
Biologically Inspired Neural Path Finding.
BI 2022 - 15th International Conference on Brain Informatics. Padova, Italy, Jul 15-15, 2022. DOI GitHub

Abstract

The human brain can be considered to be a graphical structure comprising of tens of billions of biological neurons connected by synapses. It has the remarkable ability to automatically re-route information flow through alternate paths, in case some neurons are damaged. Moreover, the brain is capable of retaining information and applying it to similar but completely unseen scenarios. In this paper, we take inspiration from these attributes of the brain to develop a computational framework to find the optimal low cost path between a source node and a destination node in a generalized graph. We show that our framework is capable of handling unseen graphs at test time. Moreover, it can find alternate optimal paths, when nodes are arbitrarily added or removed during inference, while maintaining a fixed prediction time.

MCML Authors

Hang Li

* Former Member

Qadeer Khan

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Vision & Artificial Intelligence

[322]

A. Maronikolakis, P. Baader and H. Schütze.
Analyzing Hate Speech Data along Racial, Gender and Intersectional Axes.
GeBNLP 2022 - 4th Workshop on Gender Bias in Natural Language Processing. Seattle, WA, USA, Jul 15, 2022. DOI

Abstract

To tackle the rising phenomenon of hate speech, efforts have been made towards data curation and analysis. When it comes to analysis of bias, previous work has focused predominantly on race. In our work, we further investigate bias in hate speech datasets along racial, gender and intersectional axes. We identify strong bias against African American English (AAE), masculine and AAE+Masculine tweets, which are annotated as disproportionately more hateful and offensive than from other demographics. We provide evidence that BERT-based models propagate this bias and show that balancing the training data for these protected attributes can lead to fairer models with regards to gender, but not race.

MCML Authors

Antonis Maronikolakis

* Former Member

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[321]

S. Yuan, A. Maronikolakis and H. Schütze.
Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing.
WOAH 2022 - 6th Workshop on Online Abuse and Harms. Seattle, WA, USA, Jul 14, 2022. DOI

Abstract

Research to tackle hate speech plaguing online media has made strides in providing solutions, analyzing bias and curating data. A challenging problem is ambiguity between hate speech and offensive language, causing low performance both overall and specifically for the hate speech class. It can be argued that misclassifying actual hate speech content as merely offensive can lead to further harm against targeted groups. In our work, we mitigate this potentially harmful phenomenon by proposing an adversarial debiasing method to separate the two classes. We show that our method works for English, Arabic German and Hindi, plus in a multilingual setting, improving performance over baselines.

MCML Authors

Antonis Maronikolakis

* Former Member

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[320]

S. Dandl, F. Pfisterer and B. Bischl.
Multi-Objective Counterfactual Fairness.
GECCO 2022 - Genetic and Evolutionary Computation Conference. Boston, MA, USA, Jul 09-13, 2022. DOI

Abstract

When machine learning is used to automate judgments, e.g. in areas like lending or crime prediction, incorrect decisions can lead to adverse effects for affected individuals. This occurs, e.g., if the data used to train these models is based on prior decisions that are unfairly skewed against specific subpopulations. If models should automate decision-making, they must account for these biases to prevent perpetuating or creating discriminatory practices. Counter-factual fairness audits models with respect to a notion of fairness that asks for equal outcomes between a decision made in the real world and a counterfactual world where the individual subject to a decision comes from a different protected demographic group. In this work, we propose a method to conduct such audits without access to the underlying causal structure of the data generating process by framing it as a multi-objective optimization task that can be efficiently solved using a genetic algorithm.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[319]

L. Schneider, F. Pfisterer, J. Thomas and B. Bischl.
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models.
GECCO 2022 - Genetic and Evolutionary Computation Conference. Boston, MA, USA, Jul 09-13, 2022. DOI

Abstract

The goal of Quality Diversity Optimization is to generate a collection of diverse yet high-performing solutions to a given problem at hand. Typical benchmark problems are, for example, finding a repertoire of robot arm configurations or a collection of game playing strategies. In this paper, we propose a set of Quality Diversity Optimization problems that tackle hyperparameter optimization of machine learning models - a so far underexplored application of Quality Diversity Optimization. Our benchmark problems involve novel feature functions, such as interpretability or resource usage of models. To allow for fast and efficient benchmarking, we build upon YAHPO Gym, a recently proposed open source benchmarking suite for hyperparameter optimization that makes use of high performing surrogate models and returns these surrogate model predictions instead of evaluating the true expensive black box function. We present results of an initial experimental study comparing different Quality Diversity optimizers on our benchmark problems. Furthermore, we discuss future directions and challenges of Quality Diversity Optimization in the context of hyperparameter optimization.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Learning and Data Science

[318]

M. Mittermeier, M. Weigert, D. Rügamer, H. Küchenhoff and R. Ludwig.
A deep learning based classification of atmospheric circulation types over Europe: projection of future changes in a CMIP6 large ensemble.
Environmental Research Letters 17.8 (Jul. 2022). DOI

Abstract

High- and low pressure systems of the large-scale atmospheric circulation in the mid-latitudes drive European weather and climate. Potential future changes in the occurrence of circulation types are highly relevant for society. Classifying the highly dynamic atmospheric circulation into discrete classes of circulation types helps to categorize the linkages between atmospheric forcing and surface conditions (e.g. extreme events). Previous studies have revealed a high internal variability of projected changes of circulation types. Dealing with this high internal variability requires the employment of a single-model initial-condition large ensemble (SMILE) and an automated classification method, which can be applied to large climate data sets. One of the most established classifications in Europe are the 29 subjective circulation types called Grosswetterlagen by Hess & Brezowsky (HB circulation types). We developed, in the first analysis of its kind, an automated version of this subjective classification using deep learning. Our classifier reaches an overall accuracy of 41.1% on the test sets of nested cross-validation. It outperforms the state-of-the-art automatization of the HB circulation types in 20 of the 29 classes. We apply the deep learning classifier to the SMHI-LENS, a SMILE of the Coupled Model Intercomparison Project phase 6, composed of 50 members of the EC-Earth3 model under the SSP37.0 scenario. For the analysis of future frequency changes of the 29 circulation types, we use the signal-to-noise ratio to discriminate the climate change signal from the noise of internal variability. Using a 5%-significance level, we find significant frequency changes in 69% of the circulation types when comparing the future (2071–2100) to a reference period (1991–2020).

MCML Authors

Maximilian Weigert

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[317]

K. Baßler, W. Fujii, T. S. Kapellos, E. Dudkin, N. Reusch, A. Horne, B. Reiz, M. D. Luecken, C. Osei-Sarpong, S. Warnat-Herresthal, L. Bonaguro, J. Schulte-Schrepping, A. Wagner, P. Günther, C. Pizarro, T. Schreiber, R. Knoll, L. Holsten, C. Kröger, E. De Domenico, M. Becker, K. Händler, C. T. Wohnhaas, F. Baumgartner, M. Köhler, H. Theis, M. Kraut, M. H. Wadsworth, T. K. Hughes, H. J. Ferreira, E. Hinkley, I. H. Kaltheuner, M. Geyer, C. Thiele, A. K. Shalek, A. Feißt, D. Thomas, H. Dickten, M. Beyer, P. Baum, N. Yosef, A. C. Aschenbrenner, T. Ulas, J. Hasenauer, F. J. Theis, D. Skowasch and J. L. Schultze.
Alveolar macrophages in early stage COPD show functional deviations with properties of impaired immune activation.
Frontiers in Immunology 13 (Jul. 2022). DOI

Abstract

Despite its high prevalence, the cellular and molecular mechanisms of chronic obstructive pulmonary disease (COPD) are far from being understood. Here, we determine disease-related changes in cellular and molecular compositions within the alveolar space and peripheral blood of a cohort of COPD patients and controls. Myeloid cells were the largest cellular compartment in the alveolar space with invading monocytes and proliferating macrophages elevated in COPD. Modeling cell-to-cell communication, signaling pathway usage, and transcription factor binding predicts TGF-β1 to be a major upstream regulator of transcriptional changes in alveolar macrophages of COPD patients. Functionally, macrophages in COPD showed reduced antigen presentation capacity, accumulation of cholesteryl ester, reduced cellular chemotaxis, and mitochondrial dysfunction, reminiscent of impaired immune activation.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[316]

Z. Liu, Y. Ma, M. Hildebrandt, Y. Ouyang and Z. Xiong.
CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations.
Knowledge and Information Systems 64 (Jul. 2022). DOI

Abstract

Sequential recommendations play a crucial role in many real-world applications. Due to the sequential nature, reinforcement learning has been employed to iteratively produce recommendations based on an observed stream of user behavior. In this setting, a recommendation agent interacts with the environments (users) by sequentially recommending items (actions) to maximize users’ overall long-term cumulative rewards. However, most reinforcement learning-based recommendation models only focus on extrinsic rewards based on user feedback, leading to sub-optimal policies if user-item interactions are sparse and fail to obtain the dynamic rewards based on the users’ preferences. As a remedy, we propose a dynamic intrinsic reward signal integrated with a contrastive discriminator-augmented reinforcement learning framework. Concretely, our framework contains two modules: (1) a contrastive learning module is employed to learn the representation of item sequences; (2) an intrinsic reward learning function to imitate the user’s internal dynamics. Furthermore, we combine static extrinsic reward and dynamic intrinsic reward to train a sequential recommender system based on double Q-learning. We integrate our framework with five representative sequential recommendation models. Specifically, our framework augments these recommendation models with two output layers: the supervised layer that applies cross-entropy loss to perform ranking and the other for reinforcement learning. Experimental results on two real-world datasets demonstrate that the proposed framework outperforms several sequential recommendation baselines and exploration with intrinsic reward baselines.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining

[315]

Y. Yeganeh, A. Farshad and N. Navab.
Shape-Aware Masking for Inpainting in Medical Imaging.
Preprint (Jul. 2022). arXiv

Abstract

Inpainting has recently been proposed as a successful deep learning technique for unsupervised medical image model discovery. The masks used for inpainting are generally independent of the dataset and are not tailored to perform on different given classes of anatomy. In this work, we introduce a method for generating shape-aware masks for inpainting, which aims at learning the statistical shape prior. We hypothesize that although the variation of masks improves the generalizability of inpainting models, the shape of the masks should follow the topology of the organs of interest. Hence, we propose an unsupervised guided masking approach based on an off-the-shelf inpainting model and a superpixel over-segmentation algorithm to generate a wide range of shape-dependent masks. Experimental results on abdominal MR image reconstruction show the superiority of our proposed masking method over standard methods using square-shaped or dataset of irregular shape masks.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[314]

Z. Liu, Y. Ma, M. Schubert, Y. Ouyang and Z. Xiong.
Multi-Modal Contrastive Pre-training for Recommendation.
ICMR 2022 - ACM International Conference on Multimedia Retrieval. Newark, NJ, USA, Jun 27-30, 2022. DOI

Abstract

Personalized recommendation plays a central role in various online applications. To provide quality recommendation service, it is of crucial importance to consider multi-modal information associated with users and items, e.g., review text, description text, and images. However, many existing approaches do not fully explore and fuse multiple modalities. To address this problem, we propose a multi-modal contrastive pre-training model for recommendation. We first construct a homogeneous item graph and a user graph based on the relationship of co-interaction. For users, we propose intra-modal aggregation and inter-modal aggregation to fuse review texts and the structural information of the user graph. For items, we consider three modalities: description text, images, and item graph. Moreover, the description text and image complement each other for the same item. One of them can be used as promising supervision for the other. Therefore, to capture this signal and better exploit the potential correlation of intra-modalities, we propose a self-supervised contrastive inter-modal alignment task to make the textual and visual modalities as similar as possible. Then, we apply inter-modal aggregation to obtain the multi-modal representation of items. Next, we employ a binary cross-entropy loss function to capture the potential correlation between users and items. Finally, we fine-tune the pre-trained multi-modal representations using an existing recommendation model. We have performed extensive experiments on three real-world datasets. Experimental results verify the rationality and effectiveness of the proposed method.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining

Matthias Schubert

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Spatial Artificial Intelligence

[313]

A. Bauer.
Flexible approaches in functional data and age-period-cohort analysis with application on complex geoscience data.
Dissertation 2022. DOI

Abstract

This dissertation develops new approaches for robustly estimating functional data structures and analyzing age-period-cohort (APC) effects, with applications in seismology and tourism science. The first part introduces a method that separates amplitude and phase variation in functional data, adapting a likelihood-based registration approach for generalized and incomplete data, demonstrated on seismic data. The second part presents generalized functional additive models (GFAMs) for analyzing associations between functional data and scalar covariates, along with practical guidelines and an R package. The final part addresses APC analysis, proposing new visualization techniques and a semiparametric estimation approach to disentangle temporal dimensions, with applications to tourism data, and is supported by the APCtools R package. (Shortened.)

MCML Authors

Alexander Bauer

* Former Member

[312]

S. Severini, V. Hangya, M. J. Sabet, A. Fraser and H. Schütze.
Don't Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings.
BUCC @LREC 2022 - 15th Workshop on Building and Using Comparable Corpora at the 13th International Conference on Language Resources and Evaluation (LREC 2022). Marseille, France, Jun 21-23, 2022. URL

Abstract

Bilingual Word Embeddings (BWEs) are one of the cornerstones of cross-lingual transfer of NLP models. They can be built using only monolingual corpora without supervision leading to numerous works focusing on unsupervised BWEs. However, most of the current approaches to build unsupervised BWEs do not compare their results with methods based on easy-to-access cross-lingual signals. In this paper, we argue that such signals should always be considered when developing unsupervised BWE methods. The two approaches we find most effective are: 1) using identical words as seed lexicons (which unsupervised approaches incorrectly assume are not available for orthographically distinct language pairs) and 2) combining such lexicons with pairs extracted by matching romanized versions of words with an edit distance threshold. We experiment on thirteen non-Latin languages (and English) and show that such cheap signals work well and that they outperform using more complex unsupervised methods on distant language pairs such as Chinese, Japanese, Kannada, Tamil, and Thai. In addition, they are even competitive with the use of high-quality lexicons in supervised approaches. Our results show that these training signals should not be neglected when building BWEs, even for distant languages.

MCML Authors

Viktor Hangya

Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

* Former Member

Masoud Jalili Sabet

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[311]

S. Severini, A. Imani, P. Dufter and H. Schütze.
Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages.
LREC 2022 - 13th International Conference on Language Resources and Evaluation. Marseille, France, Jun 21-23, 2022. URL

Abstract

Parallel corpora are ideal for extracting a multilingual named entity (MNE) resource, i.e., a dataset of names translated into multiple languages. Prior work on extracting MNE datasets from parallel corpora required resources such as large monolingual corpora or word aligners that are unavailable or perform poorly for underresourced languages. We present CLC-BN, a new method for creating an MNE resource, and apply it to the Parallel Bible Corpus, a corpus of more than 1000 languages. CLC-BN learns a neural transliteration model from parallel-corpus statistics, without requiring any other bilingual resources, word aligners, or seed data. Experimental results show that CLC-BN clearly outperforms prior work. We release an MNE resource for 1340 languages and demonstrate its effectiveness in two downstream tasks: knowledge graph augmentation and bilingual lexicon induction.

MCML Authors

Ayyoob Imani

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[310]

A. Khakzar, P. Khorsandi, R. Nobahari and N. Navab.
Do Explanations Explain? Model Knows Best.
CVPR 2022 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA, Jun 19-24, 2022. DOI GitHub

Abstract

It is a mystery which input features contribute to a neural network’s output. Various explanation (feature attribution) methods are proposed in the literature to shed light on the problem. One peculiar observation is that these explanations (attributions) point to different features as being important. The phenomenon raises the question, which explanation to trust? We propose a framework for evaluating the explanations using the neural network model itself. The framework leverages the network to generate input features that impose a particular behavior on the output. Using the generated features, we devise controlled experimental setups to evaluate whether an explanation method conforms to an axiom. Thus we propose an empirical framework for axiomatic evaluation of explanation methods. We evaluate well-known and promising explanation solutions using the proposed framework. The framework provides a toolset to reveal properties and drawbacks within existing and future explanation solutions

MCML Authors

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[309]

D. Muhle, L. Koestler, N. Demmel, F. Bernard and D. Cremers.
The Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions.
CVPR 2022 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA, Jun 19-24, 2022. DOI

Abstract

The estimation of the relative pose of two camera views is a fundamental problem in computer vision. Kneip et al. proposed to solve this problem by introducing the normal epipolar constraint (NEC). However, their approach does not take into account uncertainties, so that the accuracy of the estimated relative pose is highly dependent on accurate feature positions in the target frame. In this work, we introduce the probabilistic normal epipolar constraint (PNEC) that overcomes this limitation by accounting for anisotropic and inhomogeneous uncertainties in the feature positions. To this end, we propose a novel objective function, along with an efficient optimization scheme that effectively minimizes our objective while maintaining real-time performance. In experiments on synthetic data, we demonstrate that the novel PNEC yields more accurate rotation estimates than the original NEC and several popular relative rotation estimation algorithms. Furthermore, we integrate the proposed method into a state-of-the-art monocular rotation-only odometry system and achieve consistently improved results for the real-world KITTI dataset.

MCML Authors

Dominik Muhle

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Vision & Artificial Intelligence

[308]

V. Steinborn, P. Dufter, H. Jabbar and H. Schütze.
An Information-Theoretic Approach and Dataset for Probing Gender Stereotypes in Multilingual Masked Language Models.
NAACL 2022 - Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Seattle, WA, USA, Jun 10-15, 2022. DOI

Abstract

Bias research in NLP is a rapidly growing and developing field. Similar to CrowS-Pairs (Nangia et al., 2020), we assess gender bias in masked-language models (MLMs) by studying pairs of sentences with gender swapped person references.Most bias research focuses on and often is specific to English.Using a novel methodology for creating sentence pairs that is applicable across languages, we create, based on CrowS-Pairs, a multilingual dataset for English, Finnish, German, Indonesian and Thai.Additionally, we propose SJSD, a new bias measure based on Jensen–Shannon divergence, which we argue retains more information from the model output probabilities than other previously proposed bias measures for MLMs.Using multilingual MLMs, we find that SJSD diagnoses the same systematic biased behavior for non-English that previous studies have found for monolingual English pre-trained MLMs. SJSD outperforms the CrowS-Pairs measure, which struggles to find such biases for smaller non-English datasets.

MCML Authors

Victor Steinborn

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[307]

M. Zhao, F. Mi, Y. Wang, M. Li, X. Jiang, Q. Liu and H. Schütze.
LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework.
NAACL 2022 - Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Seattle, WA, USA, Jun 10-15, 2022. DOI

Abstract

Vast efforts have been devoted to creating high-performance few-shot learners, i.e., large-scale pretrained language models (PLMs) that perform well with little downstream task training data. Training PLMs has incurred significant cost, but utilizing the few-shot learners is still challenging due to their enormous size. This work focuses on a crucial question: How to make effective use of these few-shot learners? We propose LMTurk, a novel approach that treats few-shotlearners as crowdsourcing workers. The rationale is that crowdsourcing workers are in fact few-shot learners: They are shown a few illustrative examples to learn about a task and then start annotating. LMTurk employs few-shot learners built upon PLMs as workers. We show that the resulting annotations can be utilized to train models that solve the task well and are small enough to be deployable in practical scenarios. Active learning is integrated into LMTurk to reduce the amount of queries made to PLMs, minimizing the computational cost of running PLM inference passes. Altogether, LMTurk is an important step towards making effective use of current PLMs.

MCML Authors

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[306]

M. Schneble and G. Kauermann.
Intensity Estimation on Geometric Networks with Penalized Splines.
Annals of Applied Statistics 16.2 (Jun. 2022). DOI

Abstract

In the past decades the growing amount of network data lead to many novel statistical models. In this paper we consider so-called geometric networks. Typical examples are road networks or other infrastructure networks. Nevertheless, the neurons or the blood vessels in a human body can also be interpreted as a geometric network embedded in a three-dimensional space. A network-specific metric, rather than the Euclidean metric, is usually used in all these applications, making the analyses of network data challenging. We consider network-based point processes, and our task is to estimate the intensity (or density) of the process which allows us to detect high- and low-intensity regions of the underlying stochastic processes. Available routines that tackle this problem are commonly based on kernel smoothing methods. This paper uses penalized spline smoothing and extends this toward smooth intensity estimation on geometric networks. Furthermore, our approach easily allows incorporating covariates, enabling us to respect the network geometry in a regression model framework. Several data examples and a simulation study show that penalized spline-based intensity estimation on geometric networks is a numerically stable and efficient tool. Furthermore, it also allows estimating linear and smooth covariate effects, distinguishing our approach from already existing methodologies.

MCML Authors

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[305]

Q. Au, J. Herbinger, C. Stachl, B. Bischl and G. Casalicchio.
Grouped Feature Importance and Combined Features Effect Plot.
Data Mining and Knowledge Discovery 36 (Jun. 2022). DOI

Abstract

Interpretable machine learning has become a very active area of research due to the rising popularity of machine learning algorithms and their inherently challenging interpretability. Most work in this area has been focused on the interpretation of single features in a model. However, for researchers and practitioners, it is often equally important to quantify the importance or visualize the effect of feature groups. To address this research gap, we provide a comprehensive overview of how existing model-agnostic techniques can be defined for feature groups to assess the grouped feature importance, focusing on permutation-based, refitting, and Shapley-based methods. We also introduce an importance-based sequential procedure that identifies a stable and well-performing combination of features in the grouped feature space. Furthermore, we introduce the combined features effect plot, which is a technique to visualize the effect of a group of features based on a sparse, interpretable linear combination of features. We used simulation studies and real data examples to analyze, compare, and discuss these methods.

MCML Authors

Julia Herbinger

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[304]

F. Müller, Q. Khan and D. Cremers.
Lateral Ego-Vehicle Control Without Supervision Using Point Clouds.
ICPRAI 2022 - 3rd International Conference on Pattern Recognition and Artificial Intelligence. Paris, France, Jun 01-03, 2022. DOI

Abstract

Existing vision based supervised approaches to lateral vehicle control are capable of directly mapping RGB images to the appropriate steering commands. However, they are prone to suffering from inadequate robustness in real world scenarios due to a lack of failure cases in the training data. In this paper, a framework for training a more robust and scalable model for lateral vehicle control is proposed. The framework only requires an unlabeled sequence of RGB images. The trained model takes a point cloud as input and predicts the lateral offset to a subsequent frame from which the steering angle is inferred. The frame poses are in turn obtained from visual odometry. The point cloud is conceived by projecting dense depth maps into 3D. An arbitrary number of additional trajectories from this point cloud can be generated during training. This is to increase the robustness of the model. Online experiments conducted on a driving simulator show that the performance of our model is superior to that of a supervised model trained on the same initial data set and comparable to the same model but trained on data collected with noise injection.

MCML Authors

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[303]

S. Kevork and G. Kauermann.
Bipartite Exponential Random Graph Models with Nodal Random Effects.
Social Networks 70 (Jun. 2022). DOI

Abstract

We examine the inclusion of specific nodal random effects for first- and second-mode nodes towards an ERGM for bipartite networks. The inclusion of such node-specific random effects in the ERGM accounts for unobserved heterogeneity in the bipartite network and ensures stable estimation results, especially for large-scale bipartite networks. Moreover, The predicted nodal random effects deliver reasonable interpretation to understand the network behavior. The estimation is carried out by an iterative estimation technique, iterating between pseudolikelihood estimation for the nodal random effects and maximum likelihood estimation for the network parameters.

MCML Authors

Göran Kauermann

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Applied Statistics in Social Sciences, Economics and Business

[302]

L. Weissweiler, V. Hofmann, M. J. Sabet and H. Schütze.
CaMEL: Case Marker Extraction without Labels.
ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics. Dublin, Ireland, May 22-27, 2022. DOI

Abstract

We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages. We propose a first model for CaMEL that uses a massively multilingual corpus to extract case markers in 83 languages based only on a noun phrase chunker and an alignment system. To evaluate CaMEL, we automatically construct a silver standard from UniMorph. The case markers extracted by our model can be used to detect and visualise similarities and differences between the case systems of different languages as well as to annotate fine-grained deep cases in languages in which they are not overtly marked.

MCML Authors

Leonie Weissweiler

Dr.

* Former Member

Valentin Hofmann

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Masoud Jalili Sabet

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[301]

G. Fu, Z. Meng, Z. Han, Z. Ding, Y. Ma, M. Schubert, V. Tresp and R. Wattenhofer.
TempCaps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion.
SPNLP @ACL 2022 - 6th ACL Workshop on Structured Prediction for NLP at the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). Dublin, Ireland, May 22-27, 2022. DOI

Abstract

Temporal knowledge graphs store the dynamics of entities and relations during a time period. However, typical temporal knowledge graphs often suffer from incomplete dynamics with missing facts in real-world scenarios. Hence, modeling temporal knowledge graphs to complete the missing facts is important. In this paper, we tackle the temporal knowledge graph completion task by proposing TempCaps, which is a Capsule network-based embedding model for Temporal knowledge graph completion. TempCaps models temporal knowledge graphs by introducing a novel dynamic routing aggregator inspired by Capsule Networks. Specifically, TempCaps builds entity embeddings by dynamically routing retrieved temporal relation and neighbor information. Experimental results demonstrate that TempCaps reaches state-of-the-art performance for temporal knowledge graph completion. Additional analysis also shows that TempCaps is efficient.

MCML Authors

Zifeng Ding

Database Systems and Data Mining

Yunpu Ma

Dr.

Database Systems and Data Mining

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Volker Tresp

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Database Systems and Data Mining

[300]

D. Zügner.
Adversarial Robustness of Graph Neural Networks.
Dissertation 2022. URL

Abstract

In this thesis we look at graph neural networks (GNNs) from a perspective of adversarial robustness. We generalize the notion of adversarial attacks – small perturbations to the input data deliberately crafted to mislead a machine learning model – from traditional vector data such as images to graphs. We further propose robustness certification procedures for perturbations of the node attributes as well as the graph structure.

MCML Authors

Daniel Zügner

Dr.

* Former Member

[299]

P. Kopper, S. Wiegrebe, B. Bischl, A. Bender and D. Rügamer.
DeepPAMM: Deep Piecewise Exponential Additive Mixed Models for Complex Hazard Structures in Survival Analysis.
PAKDD 2022 - 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Chengdu, China, May 16-19, 2022. DOI

Abstract

Survival analysis (SA) is an active field of research that is concerned with time-to-event outcomes and is prevalent in many domains, particularly biomedical applications. Despite its importance, SA remains challenging due to small-scale data sets and complex outcome distributions, concealed by truncation and censoring processes. The piecewise exponential additive mixed model (PAMM) is a model class addressing many of these challenges, yet PAMMs are not applicable in high-dimensional feature settings or in the case of unstructured or multimodal data. We unify existing approaches by proposing DeepPAMM, a versatile deep learning framework that is well-founded from a statistical point of view, yet with enough flexibility for modeling complex hazard structures. We illustrate that DeepPAMM is competitive with other machine learning approaches with respect to predictive performance while maintaining interpretability through benchmark experiments and an extended case study.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

David Rügamer

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistics, Data Science and Machine Learning

[298]

A. Bauer, M. Weigert and H. Jalal.
APCtools: Descriptive and Model-based Age-Period-Cohort Analysis.
The Journal of Open Source Software 7.73 (May. 2022). DOI

Abstract

Age-Period-Cohort (APC) analysis aims to determine relevant drivers for long-term developments and is used in many fields of science (Yang & Land, 2013). The R package APCtools offers modern visualization techniques and general routines to facilitate the interpretability of the interdependent temporal structures and to simplify the workflow of an APC analysis. Separation of the temporal effects is performed utilizing a semiparametric regression approach. We shortly discuss the challenges of APC analysis, give an overview of existing statistical software packages and outline the main functionalities of the package.

MCML Authors

Alexander Bauer

* Former Member

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

[297]

T. Ullmann, C. Hennig and A.-L. Boulesteix.
Validation of cluster analysis results on validation data: A systematic framework.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12.3 (May. 2022). DOI

Abstract

Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To assess the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic, structured review of the existing literature about this topic. For this purpose, we outline a formal framework that covers most existing approaches for validating clustering results on validation data. In particular, we review classical validation techniques such as internal and external validation, stability analysis, and visual validation, and show how they can be interpreted in terms of our framework. We define and formalize different types of validation of clustering results on a validation dataset, and give examples of how clustering studies from the applied literature that used a validation dataset can be seen as instances of our framework.

MCML Authors

Theresa Ullmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

* Former Member

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[296]

D. Rügamer.
Additive Higher-Order Factorization Machines.
Preprint (May. 2022). arXiv

Abstract

In the age of big data and interpretable machine learning, approaches need to work at scale and at the same time allow for a clear mathematical understanding of the method’s inner workings. While there exist inherently interpretable semi-parametric regression techniques for large-scale applications to account for non-linearity in the data, their model complexity is still often restricted. One of the main limitations are missing interactions in these models, which are not included for the sake of better interpretability, but also due to untenable computational costs. To address this shortcoming, we derive a scalable high-order tensor product spline model using a factorization approach. Our method allows to include all (higher-order) interactions of non-linear feature effects while having computational costs proportional to a model without interactions. We prove both theoretically and empirically that our methods scales notably better than existing approaches, derive meaningful penalization schemes and also discuss further theoretical aspects. We finally investigate predictive and estimation performance both with synthetic and real data.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[295]

C. Tomani and D. Cremers.
Challenger: Training with Attribution Maps.
Preprint (May. 2022). arXiv

Abstract

MCML Authors

Christian Tomani

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[294]

M. Windl, S. S. Feger, L. Zijlstra, A. Schmidt and P. W. Wozniak.
‘It Is Not Always Discovery Time’: Four Pragmatic Approaches in Designing AI Systems.
CHI 2022 - Conference on Human Factors in Computing Systems. New Orleans, LA, USA, Apr 30-May 05, 2022. DOI

Abstract

While systems that use Artificial Intelligence (AI) are increasingly becoming part of everyday technology use, we do not fully understand how AI changes design processes. A structured understanding of how designers work with AI is needed to improve the design process and educate future designers. To that end, we conducted interviews with designers who participated in projects which used AI. While past work focused on AI systems created by experienced designers, we focus on the perspectives of a diverse sample of interaction designers. Our results show that the design process of an interactive system is affected when AI is integrated and that design teams adapt their processes to accommodate AI. Based on our data, we contribute four approaches adopted by interaction designers working with AI: a priori, post-hoc, model-centric, and competence-centric. Our work contributes a pragmatic account of how design processes for AI systems are enacted.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[293]

M. Windl, N. Henze, A. Schmidt and S. S. Feger.
Automating Contextual Privacy Policies: Design and Evaluation of a Production Tool for Digital Consumer Privacy Awareness.
CHI 2022 - Conference on Human Factors in Computing Systems. New Orleans, LA, USA, Apr 30-May 05, 2022. DOI

Abstract

Users avoid engaging with privacy policies because they are lengthy and complex, making it challenging to retrieve relevant information. In response, research proposed contextual privacy policies (CPPs) that embed relevant privacy information directly into their affiliated contexts. To date, CPPs are limited to concept showcases. This work evolves CPPs into a production tool that automatically extracts and displays concise policy information. We first evaluated the technical functionality on the US’s 500 most visited websites with 59 participants. Based on our results, we further revised the tool to deploy it in the wild with 11 participants over ten days. We found that our tool is effective at embedding CPP information on websites. Moreover, we found that the tool’s usage led to more reflective privacy behavior, making CPPs powerful in helping users understand the consequences of their online activities. We contribute design implications around CPP presentation to inform future systems design.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[292]

C. Leiber, D. Mautz, C. Plant and C. Böhm.
Automatic Parameter Selection for Non-Redundant Clustering.
SDM 2022 - SIAM International Conference on Data Mining. Virtual, Apr 28-30, 2022. DOI

Abstract

High-dimensional datasets often contain multiple meaningful clusterings in different subspaces. For example, objects can be clustered either by color, weight, or size, revealing different interpretations of the given dataset. A variety of approaches are able to identify such non-redundant clusterings. However, most of these methods require the user to specify the expected number of subspaces and clusters for each subspace. Stating these values is a non-trivial problem and usually requires detailed knowledge of the input dataset. In this paper, we propose a framework that utilizes the Minimum Description Length Principle (MDL) to detect the number of subspaces and clusters per subspace automatically. We describe an efficient procedure that greedily searches the parameter space by splitting and merging subspaces and clusters within subspaces. Additionally, an encoding strategy is introduced that allows us to detect outliers in each subspace. Extensive experiments show that our approach is highly competitive to state-of-the-art methods.

MCML Authors

Collin Leiber

* Former Member

Christian Böhm

Prof. Dr.

* Former Member

[291]

D. Alivanistos, M. Berrendorf, M. Cochez and M. Galkin.
Query Embedding on Hyper-Relational Knowledge Graphs.
ICLR 2022 - 10th International Conference on Learning Representations. Virtual, Apr 25-29, 2022. URL GitHub

Abstract

Multi-hop logical reasoning is an established problem in the field of representation learning on knowledge graphs (KGs). It subsumes both one-hop link prediction as well as other more complex types of logical queries. Existing algorithms operate only on classical, triple-based graphs, whereas modern KGs often employ a hyper-relational modeling paradigm. In this paradigm, typed edges may have several key-value pairs known as qualifiers that provide fine-grained context for facts. In queries, this context modifies the meaning of relations, and usually reduces the answer set. Hyper-relational queries are often observed in real-world KG applications, and existing approaches for approximate query answering cannot make use of qualifier pairs. In this work, we bridge this gap and extend the multi-hop reasoning problem to hyper-relational KGs allowing to tackle this new type of complex queries. Building upon recent advancements in Graph Neural Networks and query embedding techniques, we study how to embed and answer hyper-relational conjunctive queries. Besides that, we propose a method to answer such queries and demonstrate in our experiments that qualifiers improve query answering on a diverse set of query patterns.

MCML Authors

Max Berrendorf

Dr.

* Former Member

[290]

M. Galkin, M. Berrendorf and C. T. Hoyt.
An Open Challenge for Inductive Link Prediction on Knowledge Graphs.
GLB @WWW 2022 - Workshop on Graph Learning Benchmarks at the International World Wide Web Conference (WWW 2022). Virtual, Apr 22-29, 2022. arXiv GitHub

Abstract

An emerging trend in representation learning over knowledge graphs (KGs) moves beyond transductive link prediction tasks over a fixed set of known entities in favor of inductive tasks that imply training on one graph and performing inference over a new graph with unseen entities. In inductive setups, node features are often not available and training shallow entity embedding matrices is meaningless as they cannot be used at inference time with unseen entities. Despite the growing interest, there are not enough benchmarks for evaluating inductive representation learning methods. In this work, we introduce ILPC 2022, a novel open challenge on KG inductive link prediction. To this end, we constructed two new datasets based on Wikidata with various sizes of training and inference graphs that are much larger than existing inductive benchmarks. We also provide two strong baselines leveraging recently proposed inductive methods. We hope this challenge helps to streamline community efforts in the inductive graph representation learning area. ILPC 2022 follows best practices on evaluation fairness and reproducibility.

MCML Authors

Max Berrendorf

Dr.

* Former Member

[289]

C. T. Hoyt, M. Berrendorf, M. Gaklin, V. Tresp and B. M. Gyori.
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs.
GLB @WWW 2022 - Workshop on Graph Learning Benchmarks at the International World Wide Web Conference (WWW 2022). Virtual, Apr 22-29, 2022. arXiv

Abstract

The link prediction task on knowledge graphs without explicit negative triples in the training data motivates the usage of rank-based metrics. Here, we review existing rank-based metrics and propose desiderata for improved metrics to address lack of interpretability and comparability of existing metrics to datasets of different sizes and properties. We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory. We finally propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[288]

C. Brunner, A. Duensing, C. Schröder, M. Mittermair, V. Golkov, M. Pollanka, D. Cremers and R. Kienberger.
Deep Learning in Attosecond Metrology.
Optics Express 30.9 (Apr. 2022). Editor’s Pick. DOI

Abstract

Time-resolved photoelectron spectroscopy provides a versatile tool for investigating electron dynamics in gaseous, liquid, and solid samples on sub-femtosecond time scales. The extraction of information from spectrograms recorded with the attosecond streak camera remains a difficult challenge. Common algorithms are highly specialized and typically computationally heavy. In this work, we apply deep neural networks to map from streaking traces to near-infrared pulses as well as electron wavepackets and extensively benchmark our results on simulated data. Additionally, we illustrate domain-shift to real-world data. We also attempt to quantify the model predictive uncertainty. Our deep neural networks display competitive retrieval quality and superior tolerance against noisy data conditions, while reducing the computational time by orders of magnitude.

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[287]

J. Herbinger, B. Bischl and G. Casalicchio.
REPID: Regional Effect Plots with implicit Interaction Detection.
AISTATS 2022 - 25th International Conference on Artificial Intelligence and Statistics. Virtual, Mar 28-30, 2022. URL

Abstract

Machine learning models can automatically learn complex relationships, such as non-linear and interaction effects. Interpretable machine learning methods such as partial dependence plots visualize marginal feature effects but may lead to misleading interpretations when feature interactions are present. Hence, employing additional methods that can detect and measure the strength of interactions is paramount to better understand the inner workings of machine learning models. We demonstrate several drawbacks of existing global interaction detection approaches, characterize them theoretically, and evaluate them empirically. Furthermore, we introduce regional effect plots with implicit interaction detection, a novel framework to detect interactions between a feature of interest and other features. The framework also quantifies the strength of interactions and provides interpretable and distinct regions in which feature effects can be interpreted more reliably, as they are less confounded by interactions. We prove the theoretical eligibility of our method and show its applicability on various simulation and real-world examples.

MCML Authors

Julia Herbinger

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[286]

F. Pargent, F. Pfisterer, J. Thomas and B. Bischl.
Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features.
Computational Statistics 37 (Mar. 2022). DOI

Abstract

Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techniques that yield numeric representations of categorical variables which can then be used in subsequent ML applications. We focus on the impact of these techniques on a subsequent algorithm’s predictive performance, and—if possible—derive best practices on when to use which technique. We conducted a large-scale benchmark experiment, where we compared different encoding strategies together with five ML algorithms (lasso, random forest, gradient boosting, k-nearest neighbors, support vector machine) using datasets from regression, binary- and multiclass–classification settings. In our study, regularized versions of target encoding (i.e. using target predictions based on the feature levels in the training set as a new numerical feature) consistently provided the best results. Traditionally widely used encodings that make unreasonable assumptions to map levels to integers (e.g. integer encoding) or to reduce the number of levels (possibly based on target information, e.g. leaf encoding) before creating binary indicator variables (one-hot or dummy encoding) were not as effective in comparison.

MCML Authors

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[285]

D. Kazempour.
Advances in correlation clustering.
Dissertation 2022. DOI

Abstract

This thesis addresses key challenges in correlation clustering, particularly in high-dimensional datasets, by developing novel methods to evaluate and improve clustering algorithms. The first contribution focuses on defining and deriving internal evaluation criteria for correlation clustering, proposing a new cost function to assess cluster quality based on commonalities among existing algorithms. The second part introduces two innovative strategies for detecting regions of interest (ROIs) in Hough space, improving the robustness of the Hough transform algorithm, and extending it to handle quadratic and periodic correlated clusters. Finally, the thesis explores unifying local and global correlation clustering views and enhancing the resilience of these methods to outliers. (Shortened.)

MCML Authors

Daniyal Kazempour

Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

* Former Member

[284]

D. Strieder and M. Drton.
On the choice of the splitting ratio for the split likelihood ratio test.
Electronic Journal of Statistics 16.2 (Mar. 2022). DOI

Abstract

The recently introduced framework of universal inference provides a new approach to constructing hypothesis tests and confidence regions that are valid in finite samples and do not rely on any specific regularity assumptions on the underlying statistical model. At the core of the methodology is a split likelihood ratio statistic, which is formed under data splitting and compared to a cleverly selected universal critical value. As this critical value can be very conservative, it is interesting to mitigate the potential loss of power by careful choice of the ratio according to which data are split. Motivated by this problem, we study the split likelihood ratio test under local alternatives and introduce the resulting class of noncentral split chi-square distributions. We investigate the properties of this new class of distributions and use it to numerically examine and propose an optimal choice of the data splitting ratio for tests of composite hypotheses of different dimensions.

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

Mathematical Statistics

[283]

K. E. Riehm, E. Badillo Goicoechea, F. M. Wang, E. Kim, L. R. Aldridge, C. P. Lupton-Smith, R. Presskreischer, T.-H. Chang, S. LaRocca, F. Kreuter and E. A. Stuart.
Association of Non-Pharmaceutical Interventions to Reduce the Spread of SARS-CoV-2 With Anxiety and Depressive Symptoms: A Multi-National Study of 43 Countries.
International Journal of Public Health 67 (Mar. 2022). DOI

Abstract

Objectives: To examine the association of non-pharmaceutical interventions (NPIs) with anxiety and depressive symptoms among adults and determine if these associations varied by gender and age.
Methods: We combined survey data from 16,177,184 adults from 43 countries who participated in the daily COVID-19 Trends and Impact Survey via Facebook with time-varying NPI data from the Oxford COVID-19 Government Response Tracker between 24 April 2020 and 20 December 2020. Using logistic regression models, we examined the association of [1] overall NPI stringency and [2] seven individual NPIs (school closures, workplace closures, cancellation of public events, restrictions on the size of gatherings, stay-at-home requirements, restrictions on internal movement, and international travel controls) with anxiety and depressive symptoms.
Results: More stringent implementation of NPIs was associated with a higher odds of anxiety and depressive symptoms, albeit with very small effect sizes. Individual NPIs had heterogeneous associations with anxiety and depressive symptoms by gender and age.
Conclusion: Governments worldwide should be prepared to address the possible mental health consequences of stringent NPI implementation with both universal and targeted interventions for vulnerable groups.

MCML Authors

Frauke Kreuter

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Social Data Science and AI

[282]

C. Fritz, E. Dorigatti and D. Rügamer.
Combining Graph Neural Networks and Spatio-temporal Disease Models to Predict COVID-19 Cases in Germany.
Scientific Reports 12.3930 (Mar. 2022). DOI

Abstract

During 2020, the infection rate of COVID-19 has been investigated by many scholars from different research fields. In this context, reliable and interpretable forecasts of disease incidents are a vital tool for policymakers to manage healthcare resources. In this context, several experts have called for the necessity to account for human mobility to explain the spread of COVID-19. Existing approaches often apply standard models of the respective research field, frequently restricting modeling possibilities. For instance, most statistical or epidemiological models cannot directly incorporate unstructured data sources, including relational data that may encode human mobility. In contrast, machine learning approaches may yield better predictions by exploiting these data structures yet lack intuitive interpretability as they are often categorized as black-box models. We propose a combination of both research directions and present a multimodal learning framework that amalgamates statistical regression and machine learning models for predicting local COVID-19 cases in Germany. Results and implications: the novel approach introduced enables the use of a richer collection of data types, including mobility flows and colocation probabilities, and yields the lowest mean squared error scores throughout the observational period in the reported benchmark study. The results corroborate that during most of the observational period more dispersed meeting patterns and a lower percentage of people staying put are associated with higher infection rates. Moreover, the analysis underpins the necessity of including mobility data and showcases the flexibility and interpretability of the proposed approach.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistics, Data Science and Machine Learning

[281]

C. Nießl, M. Herrmann, C. Wiedemann, G. Casalicchio and A.-L. Boulesteix.
Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12.2 (Mar. 2022). DOI

Abstract

In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over-optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results.

MCML Authors

Christina Sauer (née Nießl)

Biometry in Molecular Medicine

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Anne-Laure Boulesteix

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Biometry in Molecular Medicine

[280]

A. Imani, L. K. Senel, M. Sabet, F. Yvon and H. Schütze.
Graph Neural Networks for Multiparallel Word Alignment.
Preprint (Mar. 2022). arXiv

Abstract

After a period of decrease, interest in word alignments is increasing again for their usefulness in domains such as typological research, cross-lingual annotation projection, and machine translation. Generally, alignment algorithms only use bitext and do not make use of the fact that many parallel corpora are multiparallel. Here, we compute high-quality word alignments between multiple language pairs by considering all language pairs together. First, we create a multiparallel word alignment graph, joining all bilingual word alignment pairs in one graph. Next, we use graph neural networks (GNNs) to exploit the graph structure. Our GNN approach (i) utilizes information about the meaning, position, and language of the input words, (ii) incorporates information from multiple parallel sentences, (iii) adds and removes edges from the initial alignments, and (iv) yields a prediction model that can generalize beyond the training sentences. We show that community detection provides valuable information for multiparallel word alignment. Our method outperforms previous work on three word-alignment datasets and on a downstream task.

MCML Authors

Ayyoob Imani

Computational Linguistics

Lütfi Kerem Senel

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[279]

M. Keicher, K. Zaripova, T. Czempiel, K. Mach, A. Khakzar and N. Navab.
Few-shot Structured Radiology Report Generation Using Natural Language Prompts.
Preprint (Mar. 2022). arXiv

Abstract

MCML Authors

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Kamilia Zaripova

Computer Aided Medical Procedures & Augmented Reality

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[278]

F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Cross-Modal Common Representation Learning with Triplet Loss Functions.
Preprint (Mar. 2022). DOI

Abstract

Common representation learning (CRL) learns a shared embedding between two or more modalities to improve in a given task over using only one of the modalities. CRL from different data types such as images and time-series data (e.g., audio or text data) requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the triplet loss, which uses positive and negative identities to create sample pairs with different labels, for CRL between image and time-series modalities. By adapting the triplet loss for CRL, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. Our experiments on synthetic data and handwriting recognition data from sensor-enhanced pens show an improved classification accuracy, faster convergence, and a better generalizability.

MCML Authors

Felix Ott

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[277]

Y. Liu, Y. Ma, M. Hildebrandt, M. Joblin and V. Tresp.
TLogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs.
AAAI 2022 - 36th Conference on Artificial Intelligence. Virtual, Feb 22-Mar 01, 2022. DOI

Abstract

Conventional static knowledge graphs model entities in relational data as nodes, connected by edges of specific relation types. However, information and knowledge evolve continuously, and temporal dynamics emerge, which are expected to influence future situations. In temporal knowledge graphs, time information is integrated into the graph by equipping each edge with a timestamp or a time range. Embedding-based methods have been introduced for link prediction on temporal knowledge graphs, but they mostly lack explainability and comprehensible reasoning chains. Particularly, they are usually not designed to deal with link forecasting – event prediction involving future timestamps. We address the task of link forecasting on temporal knowledge graphs and introduce TLogic, an explainable framework that is based on temporal logical rules extracted via temporal random walks. We compare TLogic with state-of-the-art baselines on three benchmark datasets and show better overall performance while our method also provides explanations that preserve time consistency. Furthermore, in contrast to most state-of-the-art embedding-based methods, TLogic works well in the inductive setting where already learned rules are transferred to related datasets with a common vocabulary.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[276]

S. Sharifzadeh, S. M. Baharlou, M. Schmitt, H. Schütze and V. Tresp.
Improving Scene Graph Classification by Exploiting Knowledge from Texts.
AAAI 2022 - 36th Conference on Artificial Intelligence. Virtual, Feb 22-Mar 01, 2022. DOI

Abstract

Training scene graph classification models requires a large amount of annotated image data. Meanwhile, scene graphs represent relational knowledge that can be modeled with symbolic data from texts or knowledge graphs. While image annotation demands extensive labor, collecting textual descriptions of natural scenes requires less effort. In this work, we investigate whether textual scene descriptions can substitute for annotated image data. To this end, we employ a scene graph classification framework that is trained not only from annotated images but also from symbolic data. In our architecture, the symbolic entities are first mapped to their correspondent image-grounded representations and then fed into the relational reasoning pipeline. Even though a structured form of knowledge, such as the form in knowledge graphs, is not always available, we can generate it from unstructured texts using a transformer-based language model. We show that by fine-tuning the classification pipeline with the extracted knowledge from texts, we can achieve ~8x more accurate results in scene graph classification, ~3x in object classification, and ~1.5x in predicate classification, compared to the supervised baselines with only 1% of the annotated images.

MCML Authors

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Volker Tresp

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[275]

M. Rezaei, J. J. Näppi, B. Bischl and H. Yoshida.
Bayesian uncertainty estimation for detection of long-tail and unseen conditions in abdominal images.
SPIE 2022 - SPIE Medical Imaging: Computer-Aided Diagnosis. San Diego, CA, USA, Feb 20-Mar 28, 2022. DOI

Abstract

Deep supervised learning provides an effective approach for developing robust models for various computer-aided diagnosis tasks. However, the underlying assumption is that the frequency of the samples between the different classes of the training dataset is similar or balanced. In real-world medical data, the positive classes often occur too infrequently to satisfy this assumption. Thus, there is an unmet need for deep learning systems that could automatically identify and adapt to the real-world conditions of imbalanced data. In this paper, we propose a novel Bayesian deep ensemble learning framework to address the problem of the representation learning of longtailed and out-of-distribution samples in medical images. By estimating the relative uncertainties of the input data, our framework is able to adapt to the imbalanced data for learning generalizable classifiers. To evaluate the framework, we trained and tested our framework on two public medical imaging datasets that consist of different imbalance ratios and imaging modalities. Our results on the semantic segmentation of high-resolution CT and MRI images achieved 0.93% recall, which represents a 3% relative improvement over previous state-of-the-art ensemble GANs in the handling of the associated long-tailed data and detection of out-of-distribution samples.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[274]

M. Rezaei, J. J. Näppi, B. Bischl and H. Yoshida.
Deep mutual GANs: representation learning from multiple experts.
SPIE 2022 - SPIE Medical Imaging: Imaging Informatics for Healthcare, Research, and Applications. San Diego, CA, USA, Feb 20-Mar 28, 2022. DOI

Abstract

Representation learning is one of the canonical objectives of most deep learning models. However, the learning of real-world clinical data is often compromised by their inherently imbalanced or long-tailed distribution wherein a few classes have significantly larger numbers of training instances than do the other classes. In this study, we investigated the representation learning of such long-tailed data distributions by the use of a deep mutual ensemble generative adversarial network. Our proposed framework consists of multiple powerful pre-trained discriminator networks that transfer knowledge to multiple individual untrained generator networks. During the training process, each generator learns to collaborate with the other generators. Additionally, each generator receives feedback from the individual discriminators in an adversarial manner. Especially, we explored the use of mutual information shared between the independent generators that makes our framework robust against misclassification of long-tailed data distributions in medical image analysis. We evaluated our proposed framework on four public datasets that represented different medical imaging modalities and imbalance ratios. Our experimental results show that our proposed framework benefits from ensemble learning and shared mutual learning, and achieves compelling results on several medical imaging benchmarks. Thus, our approach offers potential advantages over traditional deep learning in real-world applications.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[273]

G. De Nicola, B. Sischka and G. Kauermann.
Mixture Models and Networks: The Stochastic Block Model.
Statistical Modelling 22.1-2 (Feb. 2022). DOI

Abstract

Mixture models are probabilistic models aimed at uncovering and representing latent subgroups within a population. In the realm of network data analysis, the latent subgroups of nodes are typically identified by their connectivity behaviour, with nodes behaving similarly belonging to the same community. In this context, mixture modelling is pursued through stochastic blockmodelling. We consider stochastic blockmodels and some of their variants and extensions from a mixture modelling perspective. We also explore some of the main classes of estimation methods available and propose an alternative approach based on the reformulation of the blockmodel as a graphon. In addition to the discussion of inferential properties and estimating procedures, we focus on the application of the models to several real-world network datasets, showcasing the advantages and pitfalls of different approaches.

MCML Authors

Göran Kauermann

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Statistics in Social Sciences, Economics and Business

[272]

A. Scagliotti and P. Colli Franzone.
Accelerated subgradient methods.
Preprint (Feb. 2022). arXiv

Abstract

Subgradient methods are the natural extension to the non-smooth case of the classical gradient descent for regular convex optimization problems. However, in general, they are characterized by slow convergence rates, and they require decreasing step-sizes to converge. In this paper we propose a subgradient method with constant step-size for composite convex objectives with ℓ1-regularization. If the smooth term is strongly convex, we can establish a linear convergence result for the function values. This fact relies on an accurate choice of the element of the subdifferential used for the update, and on proper actions adopted when non-differentiability regions are crossed. Then, we propose an accelerated version of the algorithm, based on conservative inertial dynamics and on an adaptive restart strategy, that is guaranteed to achieve a linear convergence rate in the strongly convex case. Finally, we test the performances of our algorithms on some strongly and non-strongly convex examples.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[271]

M. Berrendorf.
Machine learning for managing structured and semi-structured data.
Dissertation 2022. DOI

Abstract

As data availability grows across sectors, machine learning, especially graph neural networks, plays a crucial role in extracting insights by automating complex analysis, including relational learning. Knowledge graphs help store entity facts, though they often require automated methods like Link Prediction and Entity Alignment to fill in missing information due to the sheer volume. This thesis advances knowledge graph completion by improving Entity Alignment through active learning, refining Link Prediction with metadata, and introducing a new evaluation metric, as well as a software library to aid researchers. (Shortened).

MCML Authors

Max Berrendorf

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[270]

F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Joint Classification and Trajectory Regression of Online Handwriting Using a Multi-Task Learning Approach.
WACV 2022 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2022. DOI

Abstract

Multivariate Time Series (MTS) classification is important in various applications such as signature verification, person identification, and motion recognition. In deep learning these classification tasks are usually learned using the cross-entropy loss. A related yet different task is predicting trajectories observed as MTS. Important use cases include handwriting reconstruction, shape analysis, and human pose estimation. The goal is to align an arbitrary dimensional time series with its ground truth as accurately as possible while reducing the error in the prediction with a distance loss and the variance with a similarity loss. Although learning both losses with Multi-Task Learning (MTL) helps to improve trajectory alignment, learning often remains difficult as both tasks are contradictory. We propose a novel neural network architecture for MTL that notably improves the MTS classification and trajectory regression performance in online handwriting (OnHW) recognition. We achieve this by jointly learning the cross-entropy loss in combination with distance and similarity losses. On an OnHW task of handwritten characters with multivariate inertial and visual data inputs we are able to achieve crucial improvements (lower error with less variance) of trajectory prediction while still improving the character classification accuracy in comparison to models trained on the individual tasks.

MCML Authors

Felix Ott

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[269]

J. Goldsmith and F. Scheipl.
tf: S3 classes and methods for tidy functional data. R package.
2022. GitHub

Abstract

The goal of tidyfun, in turn, is to provide accessible and well-documented software that makes functional data analysis in R easy – specifically data wrangling and exploratory analysis.

MCML Authors

Fabian Scheipl

PD Dr.

Functional Data Analysis

[268]

J. Goldsmith and F. Scheipl.
tidyfun: Clean, wholesome, tidy fun with functional data in R. R package.
2022. GitHub

Abstract

The goal of tidyfun, in turn, is to provide accessible and well-documented software that makes functional data analysis in R easy – specifically data wrangling and exploratory analysis.

MCML Authors

Fabian Scheipl

PD Dr.

Functional Data Analysis

[267]

R. Valliant, J. A. Dever, F. Kreuter and G. Zipf.
Package ‘PracTools’.
2022. URL

Abstract

Functions and datasets to support Valliant, Dever, and Kreuter (2018), ‘Practical Tools for Designing and Weighting Survey Samples’. Contains functions for sample size calculation for survey samples using stratified or clustered one-, two-, and three-stage sample designs, and single-stage audit sample designs. Functions are included that will group geographic units accounting for distances apart and measures of size. Other functions compute variance components for multistage designs and sample sizes in two-phase designs. A number of example data sets are included.

MCML Authors

Frauke Kreuter

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Social Data Science and AI

[266]

W. Hartl, P. Kopper, A. Bender, F. Scheipl, A. G. Day, G. Elke and H. Küchenhoff.
Protein intake and outcome of critically ill patients: analysis of a large international database using piece-wise exponential additive mixed models.
Critical Care 26.7 (Jan. 2022). DOI

Abstract

Background: Proteins are an essential part of medical nutrition therapy in critically ill patients. Guidelines almost universally recommend a high protein intake without robust evidence supporting its use.
Methods: Using a large international database, we modelled associations between the hazard rate of in-hospital death and live hospital discharge (competing risks) and three categories of protein intake (low: < 0.8 g/kg per day, standard: 0.8–1.2 g/kg per day, high: > 1.2 g/kg per day) during the first 11 days after ICU admission (acute phase). Time-varying cause-specific hazard ratios (HR) were calculated from piece-wise exponential additive mixed models. We used the estimated model to compare five different hypothetical protein diets (an exclusively low protein diet, a standard protein diet administered early (day 1 to 4) or late (day 5 to 11) after ICU admission, and an early or late high protein diet).
Results: Of 21,100 critically ill patients in the database, 16,489 fulfilled inclusion criteria for the analysis. By day 60, 11,360 (68.9%) patients had been discharged from hospital, 4,192 patients (25.4%) had died in hospital, and 937 patients (5.7%) were still hospitalized. Median daily low protein intake was 0.49 g/kg [IQR 0.27–0.66], standard intake 0.99 g/kg [IQR 0.89– 1.09], and high intake 1.41 g/kg [IQR 1.29–1.60]. In comparison with an exclusively low protein diet, a late standard protein diet was associated with a lower hazard of in-hospital death: minimum 0.75 (95% CI 0.64, 0.87), and a higher hazard of live hospital discharge: maximum HR 1.98 (95% CI 1.72, 2.28). Results on hospital discharge, however, were qualitatively changed by a sensitivity analysis. There was no evidence that an early standard or a high protein intake during the acute phase was associated with a further improvement of outcome.
Conclusions: Provision of a standard protein intake during the late acute phase may improve outcome compared to an exclusively low protein diet. In unselected critically ill patients, clinical outcome may not be improved by a high protein intake during the acute phase.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Fabian Scheipl

PD Dr.

Functional Data Analysis

Helmut Küchenhoff

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Statistical Consulting Unit (StaBLab)

[265]

M. J. Sabet.
Multilingual representations and models for improved low-resource language processing.
Dissertation 2022. DOI

Abstract

This thesis examines methods to improve Natural Language Processing (NLP) for low-resource languages, addressing challenges such as limited training data, lack of tokenization models, and difficulties in word segmentation. While pretrained language models have advanced multilingual representation learning, they primarily benefit high-resource languages. This work explores multilinguality in language models and develops techniques for word alignment without requiring parallel data. Key contributions include analyzing multilingual word alignments, extracting alignments from the Bible corpus, applying graph algorithms to improve alignments, generating cross-lingual embeddings from small parallel corpora, and enhancing alignment quality through subword sampling. These efforts aim to improve NLP for underrepresented languages. (Shortened.)

MCML Authors

Masoud Jalili Sabet

Dr.

* Former Member

[264]

F. Ott, D. Rügamer, L. Heublein, T. Hamann, J. Barth, B. Bischl and C. Mutschler.
Benchmarking online sequence-to-sequence and character-based handwriting recognition from IMU-enhanced pens.
International Journal on Document Analysis and Recognition 25.4 (2022). DOI

Abstract

Handwriting is one of the most frequently occurring patterns in everyday life and with it comes challenging applications such as handwriting recognition, writer identification and signature verification. In contrast to offline HWR that only uses spatial information (i.e., images), online HWR uses richer spatio-temporal information (i.e., trajectory data or inertial data). While there exist many offline HWR datasets, there are only little data available for the development of OnHWR methods on paper as it requires hardware-integrated pens. This paper presents data and benchmark models for real-time sequence-to-sequence learning and single character-based recognition. Our data are recorded by a sensor-enhanced ballpoint pen, yielding sensor data streams from triaxial accelerometers, a gyroscope, a magnetometer and a force sensor at 100 Hz. We propose a variety of datasets including equations and words for both the writer-dependent and writer-independent tasks. Our datasets allow a comparison between classical OnHWR on tablets and on paper with sensor-enhanced pens. We provide an evaluation benchmark for seq2seq and single character-based HWR using recurrent and temporal convolutional networks and transformers combined with a connectionist temporal classification (CTC) loss and cross-entropy (CE) losses. Our convolutional network combined with BiLSTMs outperforms transformer-based architectures, is on par with InceptionTime for sequence-based classification tasks and yields better results compared to 28 state-of-the-art techniques. Time-series augmentation methods improve the sequence-based task, and we show that CE variants can improve the single classification task. Our implementations together with the large benchmark of state-of-the-art techniques of novel OnHWR datasets serve as a baseline for future research in the area of OnHWR on paper.

MCML Authors

Felix Ott

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Statistical Learning and Data Science

[263]

C. Fritz and G. Kauermann.
On the Interplay of Regional Mobility, Social Connectedness, and the Spread of COVID-19 in Germany.
Journal of the Royal Statistical Society. Series A (Statistics in Society) 185.1 (Jan. 2022). DOI

Abstract

Since the primary mode of respiratory virus transmission is person-to-person interaction, we are required to reconsider physical interaction patterns to mitigate the number of people infected with COVID-19. While research has shown that non-pharmaceutical interventions (NPI) had an evident impact on national mobility patterns, we investigate the relative regional mobility behaviour to assess the effect of human movement on the spread of COVID-19. In particular, we explore the impact of human mobility and social connectivity derived from Facebook activities on the weekly rate of new infections in Germany between 3 March and 22 June 2020. Our results confirm that reduced social activity lowers the infection rate, accounting for regional and temporal patterns. The extent of social distancing, quantified by the percentage of people staying put within a federal administrative district, has an overall negative effect on the incidence of infections. Additionally, our results show spatial infection patterns based on geographical as well as social distances.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[262]

A. Python, A. Bender, M. Blangiardo, J. B. Illian, Y. Lin, B. Liu, T. C. D. Lucas, S. Tan, Y. Wen, D. Svanidze and J. Yin.
A downscaling approach to compare COVID-19 count data from databases aggregated at different spatial scales.
Journal of the Royal Statistical Society. Series A (Statistics in Society) 185.1 (Jan. 2022). DOI

Abstract

As the COVID-19 pandemic continues to threaten various regions around the world, obtaining accurate and reliable COVID-19 data is crucial for governments and local communities aiming at rigorously assessing the extent and magnitude of the virus spread and deploying efficient interventions. Using data reported between January and February 2020 in China, we compared counts of COVID-19 from near-real-time spatially disaggregated data (city level) with fine-spatial scale predictions from a Bayesian downscaling regression model applied to a reference province-level data set. The results highlight discrepancies in the counts of coronavirus-infected cases at the district level and identify districts that may require further investigation.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[261]

V.-L. Nguyen, M. H. Shaker and E. Hüllermeier.
How to measure uncertainty in uncertainty sampling for active learning.
Machine Learning 111.1 (2022). DOI

Abstract

Various strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[260]

G. Palla, H. Spitzer, M. Klein, D. Fischer, A. C. Schaar, L. B. Kuemmerle, S. Rybakov, I. L. Ibarra, O. Holmberg, I. Virshup, M. Lotfollahi, S. Richter and F. J. Theis.
Squidpy: a scalable framework for spatial omics analysis.
Nature Methods 19 (Jan. 2022). DOI

Abstract

Spatial omics data are advancing the study of tissue organization and cellular communication at an unprecedented scale. Flexible tools are required to store, integrate and visualize the large diversity of spatial omics data. Here, we present Squidpy, a Python framework that brings together tools from omics and image analysis to enable scalable description of spatial molecular data, such as transcriptome or multivariate proteins. Squidpy provides efficient infrastructure and numerous analysis methods that allow to efficiently store, manipulate and interactively visualize spatial omics data. Squidpy is extensible and can be interfaced with a variety of already existing libraries for the scalable analysis of spatial omics data.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[259]

M. Lange, V. Bergen, M. Klein, M. Setty, B. Reuter, M. Bakhti, H. Lickert, M. Ansari, J. Schniering, H. B. Schiller, D. Pe’er and F. J. Theis.
CellRank for directed single-cell fate mapping.
Nature Methods 19.2 (Jan. 2022). DOI

Abstract

Computational trajectory inference enables the reconstruction of cell state dynamics from single-cell RNA sequencing experiments. However, trajectory inference requires that the direction of a biological process is known, largely limiting its application to differentiating systems in normal development. Here, we present CellRank (https://cellrank.org) for single-cell fate mapping in diverse scenarios, including regeneration, reprogramming and disease, for which direction is unknown. Our approach combines the robustness of trajectory inference with directional information from RNA velocity, taking into account the gradual and stochastic nature of cellular fate decisions, as well as uncertainty in velocity vectors. On pancreas development data, CellRank automatically detects initial, intermediate and terminal populations, predicts fate potentials and visualizes continuous gene expression trends along individual lineages. Applied to lineage-traced cellular reprogramming data, predicted fate probabilities correctly recover reprogramming outcomes. CellRank also predicts a new dedifferentiation trajectory during postinjury lung regeneration, including previously unknown intermediate cell states, which we confirm experimentally.

MCML Authors

Marius Lange

Dr.

* Former Member

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Modelling of Biological Systems

[258]

E. Dorigatti, J. Goschenhofer, B. Schubert, M. Rezaei and B. Bischl.
Positive-Unlabeled Learning with Uncertainty-aware Pseudo-label Selection.
Preprint (Jan. 2022). arXiv

Abstract

Positive-unlabeled learning (PUL) aims at learning a binary classifier from only positive and unlabeled training data. Even though real-world applications often involve imbalanced datasets where the majority of examples belong to one class, most contemporary approaches to PUL do not investigate performance in this setting, thus severely limiting their applicability in practice. In this work, we thus propose to tackle the issues of imbalanced datasets and model calibration in a PUL setting through an uncertainty-aware pseudo-labeling procedure (PUUPL): by boosting the signal from the minority class, pseudo-labeling expands the labeled dataset with new samples from the unlabeled set, while explicit uncertainty quantification prevents the emergence of harmful confirmation bias leading to increased predictive performance. Within a series of experiments, PUUPL yields substantial performance gains in highly imbalanced settings while also showing strong performance in balanced PU scenarios across recent baselines. We furthermore provide ablations and sensitivity analyses to shed light on PUUPL’s several ingredients. Finally, a real-world application with an imbalanced dataset confirms the advantage of our approach.

MCML Authors

Emilio Dorigatti

Dr.

* Former Member

Jann Goschenhofer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

2021

[257]

L. Qian, C. Plant and C. Böhm.
Density-based Clustering for Adaptive Density Variation.
ICDM 2021 - 21st IEEE International Conference on Data Mining. Auckland, New Zealand, Dec 07-10, 2021. DOI

Abstract

Cluster analysis plays a crucial role in data mining and knowledge discovery. Although many researchers have investigated clustering algorithms over the past few decades, most of the well-known algorithms have shortcomings when dealing with clusters of arbitrary shapes and varying sizes and in the presence of noise and outliers. Density-based methods partially solve these issues but fail to discover clusters with varying densities. In this paper, we propose a novel Density-Based clustering algorithm for Adaptive Density Variation (DBADV), which is based on the classic clustering algorithm DBSCAN. To address the problem of density variation, we define the local density information, which not only reflects the individual property of each object but also describes the density distribution of clusters, and finds the adaptive search range of each object by collecting information from its neighbors. Moreover, we design a new metric to obtain the mutual nearest neighbors of each object to better detect the objects around the boundaries between clusters. We show the effectiveness of our method in extensive experiments on synthetic and realworld data sets, which demonstrate that the performance of the proposed algorithm DBADV is superior to other competitive clustering algorithms.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[256]

A. Beer, L. Stephan and T. Seidl.
LUCKe- Connecting Clustering and Correlation Clustering.
ICDMW 2021 - IEEE International Conference on Data Mining Workshops. Auckland, New Zealand, Dec 07-10, 2021. DOI

Abstract

LUCKe allows any purely distance-based ‘classic’ clustering algorithm to reliably find linear correlation clusters. An elaborated distance matrix based on the points’ local PCA extracts all necessary information from high dimensional data to declare points of the same arbitrary dimensional linear correlation cluster as ‘similar’. For that, the points’ eigensystems as well as only the relevant information about their position in space, are put together. LUCKe allows transferring known benefits from the large field of basic clustering to correlation clustering. Its applicability is shown in extensive experiments with simple representatives of diverse basic clustering approaches.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[255]

J. Busch, M. Hünemörder, J. Held, P. Kröger and T. Seidl.
Implicit Hough Transform Neural Networks for Subspace Clustering.
ICDMW 2021 - IEEE International Conference on Data Mining Workshops. Auckland, New Zealand, Dec 07-10, 2021. DOI

Abstract

Subspace clustering constitutes a fundamental task in data mining and unsupervised machine learning with myriad applications. We present a novel approach to subspace clustering that detects affine hyperplanes in a given arbitrary-dimensional dataset by explicitly parametrizing them and optimizing their parameters using gradient updates w.r.t. a differentiable loss function. The explicit parametrization allows our model to avoid the exponential search space incurred by models relying on an explicit Hough transform to detect subspaces by searching for high-density points in parameter space. Compared to other existing approaches, our method is highly scalable, can be trained very efficiently on a GPU, is applicable to out-of-sample data, and is amenable to anytime scenarios since training can be stopped at any time and convergence is usually fast. The model can further be viewed as a linear neural network layer and trained end-to-end with an autoencoder to detect arbitrary non-linear correlations. We provide empirical results on a wide array of synthetic datasets with different characteristics following a rigorous evaluation protocol. Our results demonstrate the advantageous properties of our model and additionally reveal that it is particularly robust to jitter and noise present in the data.

MCML Authors

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[254]

A. Lohrer, J. Deller, M. Hünemörder and P. Kröger.
OAB - An Open Anomaly Benchmark Framework for Unsupervised and Semisupervised Anomaly Detection on Image and Tabular Data Sets.
ICDMW 2021 - IEEE International Conference on Data Mining Workshops. Auckland, New Zealand, Dec 07-10, 2021. DOI

Abstract

We introduce OAB, an Open Anomaly Benchmark Framework for unsupervised and semisupervised anomaly detection on image and tabular data sets, ensuring simple reproducibility for existing benchmark results as well as a reliable comparability and low-effort extensibility when new anomaly detection algorithms or new data sets are added. While making established methods of the most popular benchmarks easily accessible, OAB generalizes the task of un- and semisupervised anomaly benchmarking and offers besides commonly used benchmark data sets also semantically meaningful real-world anomaly data sets as well as a broad range of traditional and state-of-the-art anomaly detection algorithms. The benefit of OAB for the research community has been demonstrated by reproducing and extending existing benchmarks to new algorithms with very low effort allowing researchers to focus on the actual algorithm research.

MCML Authors

Andreas Lohrer

* Former Member

Peer Kröger

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Principal Investigator

[253]

J. Moosbauer, J. Herbinger, G. Casalicchio, M. Lindauer and B. Bischl.
Explaining Hyperparameter Optimization via Partial Dependence Plots.
NeurIPS 2021 - 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. URL GitHub

Abstract

Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of explainability makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO with Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, such as the partial dependence plot (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. We propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions.

MCML Authors

Julia Moosbauer

Dr.

* Former Member

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[252]

B. Bischl, G. Casalicchio, M. Feurer, P. Gijsbers, F. Hutter, M. Lang, R. G. Mantovani, J. N. van Rijn and J. Vanschoren.
OpenML Benchmarking Suites.
NeurIPS 2021 - Track on Datasets and Benchmarks at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. URL

Abstract

Machine learning research depends on objectively interpretable, comparable, and reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites of machine learning tasks to standardize the setup, execution, and reporting of benchmarks. We enable this through software tools that help to create and leverage these benchmarking suites. These are seamlessly integrated into the OpenML platform, and accessible through interfaces in Python, Java, and R. OpenML benchmarking suites (a) are easy to use through standardized data formats, APIs, and client libraries; (b) come with extensive meta-information on the included datasets; and (c) allow benchmarks to be shared and reused in future studies. We then present a first, carefully curated and practical benchmarking suite for classification: the OpenML Curated Classification benchmarking suite 2018 (OpenML-CC18). Finally, we discuss use cases and applications which demonstrate the usefulness of OpenML benchmarking suites and the OpenML-CC18 in particular.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

* Former Member

[251]

M. Weber, J. Xie, M. Collins, Y. Zhu, H. Adam, B. Green, A. Geiger, D. Cremers, A. Ošep, L. Leal-Taixé, P. Voigtlaender and B. Chen.
STEP: Segmenting and Tracking Every Pixel.
NeurIPS 2021 - Track on Datasets and Benchmarks at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. PDF

Abstract

The task of assigning semantic classes and track identities to every pixel in a video is called video panoptic segmentation. Our work is the first that targets this task in a real-world setting requiring dense interpretation in both spatial and temporal domains. As the ground-truth for this task is difficult and expensive to obtain, existing datasets are either constructed synthetically or only sparsely annotated within short video clips. To overcome this, we introduce a new benchmark encompassing two datasets, KITTI-STEP, and MOTChallenge-STEP. The datasets contain long video sequences, providing challenging examples and a test-bed for studying long-term pixel-precise segmentation and tracking under real-world conditions. We further propose a novel evaluation metric Segmentation and Tracking Quality (STQ) that fairly balances semantic and tracking aspects of this task and is more appropriate for evaluating sequences of arbitrary length. Finally, we provide several baselines to evaluate the status of existing methods on this new challenging dataset. We have made our datasets, metric, benchmark servers, and baselines publicly available, and hope this will inspire future research.

MCML Authors

Mark Weber

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Laura Leal-Taixé

Prof. Dr.

* Former Principal Investigator

[250]

Y. Zhang, A. Khakzar, Y. Li, A. Farshad, S. T. Kim and N. Navab.
Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information.
NeurIPS 2021 - Track on Datasets and Benchmarks at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. URL

Abstract

One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network’s prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features’ information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.

MCML Authors

Ashkan Khakzar

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Yawei Li

Statistical Learning and Data Science

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[249]

T. Weber, M. Ingrisch, M. Fabritius, B. Bischl and D. Rügamer.
Survival-oriented embeddings for improving accessibility to complex data structures.
NeurIPS 2021 - Workshop on Bridging the Gap: from Machine Learning Research to Clinical Practice at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. arXiv

Abstract

Deep learning excels in the analysis of unstructured data and recent advancements allow to extend these techniques to survival analysis. In the context of clinical radiology, this enables, e.g., to relate unstructured volumetric images to a risk score or a prognosis of life expectancy and support clinical decision making. Medical applications are, however, associated with high criticality and consequently, neither medical personnel nor patients do usually accept black box models as reason or basis for decisions. Apart from averseness to new technologies, this is due to missing interpretability, transparency and accountability of many machine learning methods. We propose a hazard-regularized variational autoencoder that supports straightforward interpretation of deep neural architectures in the context of survival analysis, a field highly relevant in healthcare. We apply the proposed approach to abdominal CT scans of patients with liver tumors and their corresponding survival times.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[248]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation.
NeurIPS 2021 - Workshop on Deep Generative Models and Downstream Applications at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. PDF

Abstract

The application of deep learning in survival analysis (SA) allows utilizing unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance, and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches. We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistics, Data Science and Machine Learning

[247]

M. Mittermeier, M. Weigert and D. Rügamer.
Identifying the atmospheric drivers of drought and heat using a smoothed deep learning approach.
NeurIPS 2021 - Workshop on Tackling Climate Change with Machine Learning at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. PDF

Abstract

Europe was hit by several, disastrous heat and drought events in recent summers. Besides thermodynamic influences, such hot and dry extremes are driven by certain atmospheric situations including anticyclonic conditions. Effects of climate change on atmospheric circulations are complex and many open research questions remain in this context, e.g., on future trends of anticyclonic conditions. Based on the combination of a catalog of labeled circulation patterns and spatial atmospheric variables, we propose a smoothed convolutional neural network classifier for six types of anticyclonic circulations that are associated with drought and heat. Our work can help to identify important drivers of hot and dry extremes in climate simulations, which allows to unveil the impact of climate change on these drivers. We address various challenges inherent to circulation pattern classification that are also present in other climate patterns, e.g., subjective labels and unambiguous transition periods.

MCML Authors

Maximilian Weigert

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[246]

L. Hetzel, D. S. Fischer, S. Günnemann and F. J. Theis.
Graph representation learning for single-cell biology.
Current Opinion in Systems Biology 28.100347 (Dec. 2021). DOI

Abstract

Single-cell RNA sequencing measures gene expression at an unprecedented resolution and scale and allows the analysis of cellular phenotypes which was not possible before. In this context, graphs occur as a natural representation of the system —both as gene-centric and cell-centric. However, many advances in machine learning on graphs are not yet harnessed in models on single-cell data. Taking the inference of cell types or gene interactions as examples, graph representation learning has a wide applicability to both cell and gene graphs. Recent advances in spatial molecular profiling additionally put graph learning in the focus of attention because of the innate resemblance of spatial information to spatial graphs. We argue that graph embedding techniques have great potential for various applications across single-cell biology. Here, we discuss how graph representation learning maps to current models and concepts used in single-cell biology and formalise overlaps to developments in graph-based deep learning.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[245]

S. Kevork and G. Kauermann.
Iterative Estimation of Mixed Exponential Random Graph Models with Nodal Random Effects.
Network Science 9.4 (Dec. 2021). DOI

Abstract

The presence of unobserved node-specific heterogeneity in exponential random graph models (ERGM) is a general concern, both with respect to model validity as well as estimation instability. We, therefore, include node-specific random effects in the ERGM that account for unobserved heterogeneity in the network. This leads to a mixed model with parametric as well as random coefficients, labelled as mixed ERGM. Estimation is carried out by iterating between approximate pseudolikelihood estimation for the random effects and maximum likelihood estimation for the remaining parameters in the model. This approach provides a stable algorithm, which allows to fit nodal heterogeneity effects even for large scale networks. We also propose model selection based on the Akaike Information Criterion to check for node-specific heterogeneity.

MCML Authors

Göran Kauermann

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Applied Statistics in Social Sciences, Economics and Business

[244]

M. Bernhard and M. Schubert.
Correcting Imprecise Object Locations for Training Object Detectors in Remote Sensing Applications.
Remote Sensing 13 (Dec. 2021). URL

Abstract

Object detection on aerial and satellite imagery is an important tool for image analysis in remote sensing and has many areas of application. As modern object detectors require accurate annotations for training, manual and labor-intensive labeling is necessary. In situations where GPS coordinates for the objects of interest are already available, there is potential to avoid the cumbersome annotation process. Unfortunately, GPS coordinates are often not well-aligned with georectified imagery. These spatial errors can be seen as noise regarding the object locations, which may critically harm the training of object detectors and, ultimately, limit their practical applicability. To overcome this issue, we propose a co-correction technique that allows us to robustly train a neural network with noisy object locations and to transform them toward the true locations. When applied as a preprocessing step on noisy annotations, our method greatly improves the performance of existing object detectors. Our method is applicable in scenarios where the images are only annotated with points roughly indicating object locations, instead of entire bounding boxes providing precise information on the object locations and extents. We test our method on three datasets and achieve a substantial improvement (e.g., 29.6% mAP on the COWC dataset) over existing methods for noise-robust object detection.

MCML Authors

Maximilian Bernhard

* Former Member

Matthias Schubert

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Spatial Artificial Intelligence

[243]

Y. Elazar, N. Kassner, S. Ravfogel, A. Ravichander, E. Hovy, H. Schütze and Y. Goldberg.
Measuring and Improving Consistency in Pretrained Language Models.
Transactions of the Association for Computational Linguistics 9 (Dec. 2021). DOI

Abstract

Consistency of a model—that is, the invariance of its behavior under meaning-preserving alternations in its input—is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel, we show that the consistency of all PLMs we experiment with is poor— though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.

MCML Authors

Nora Kassner

Dr.

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[242]

A. Farshad, S. Musatian, H. Dhamo and N. Navab.
MIGS: Meta Image Generation from Scene Graphs.
BMVC 2021 - 32nd British Machine Vision Conference. Virtual, Nov 22-25, 2021. URL GitHub

Abstract

Generation of images from scene graphs is a promising direction towards explicit scene generation and manipulation. However, the images generated from the scene graphs lack quality, which in part comes due to high difficulty and diversity in the data. We propose MIGS (Meta Image Generation from Scene Graphs), a meta-learning based approach for few-shot image generation from graphs that enables adapting the model to different scenes and increases the image quality by training on diverse sets of tasks. By sampling the data in a task-driven fashion, we train the generator using meta-learning on different sets of tasks that are categorized based on the scene attributes. Our results show that using this meta-learning approach for the generation of images from scene graphs achieves state-of-the-art performance in terms of image quality and capturing the semantic relationships in the scene.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[241]

A. Beer.
On the edges of clustering: creating synergies with related problems.
Dissertation 2021. DOI

Abstract

This thesis explores the connections between clustering and related tasks like subspace clustering, correlation clustering, outlier detection, and data ordering. It introduces novel methods such as the KISS score for subspace clustering, LUCK for correlation clustering, and the ABC algorithm for outlier detection. Additionally, it develops the Circle Index for optimizing data ordering to improve clustering performance. (Shortened.)

MCML Authors

Anna Beer

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

[240]

A. Imani, M. J. Sabet, L. K. Senel, P. Philipp, F. Yvon and H. Schütze.
Graph Algorithms for Multiparallel Word Alignment.
EMNLP 2021 - Conference on Empirical Methods in Natural Language Processing. Punta Cana, Dominican Republic, Nov 07-11, 2021. DOI

Abstract

With the advent of end-to-end deep learning approaches in machine translation, interest in word alignments initially decreased; however, they have again become a focus of research more recently. Alignments are useful for typological research, transferring formatting like markup to translated texts, and can be used in the decoding of machine translation systems. At the same time, massively multilingual processing is becoming an important NLP scenario, and pretrained language and machine translation models that are truly multilingual are proposed. However, most alignment algorithms rely on bitexts only and do not leverage the fact that many parallel corpora are multiparallel. In this work, we exploit the multiparallelity of corpora by representing an initial set of bilingual alignments as a graph and then predicting additional edges in the graph. We present two graph algorithms for edge prediction: one inspired by recommender systems and one based on network link prediction. Our experimental results show absolute improvements in F1 of up to 28{%} over the baseline bilingual word aligner in different datasets.

MCML Authors

Ayyoob Imani

Computational Linguistics

Masoud Jalili Sabet

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Lütfi Kerem Senel

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[239]

N. Kassner, O. Tafjord, H. Schütze and P. Clark.
BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief.
EMNLP 2021 - Conference on Empirical Methods in Natural Language Processing. Punta Cana, Dominican Republic, Nov 07-11, 2021. DOI

Abstract

Although pretrained language models (PTLMs) contain significant amounts of world knowledge, they can still produce inconsistent answers to questions when probed, even after specialized training. As a result, it can be hard to identify what the model actually “believes” about the world, making it susceptible to inconsistent behavior and simple errors. Our goal is to reduce these problems. Our approach is to embed a PTLM in a broader system that also includes an evolving, symbolic memory of beliefs – a BeliefBank – that records but then may modify the raw PTLM answers. We describe two mechanisms to improve belief consistency in the overall system. First, a reasoning component – a weighted MaxSAT solver – revises beliefs that significantly clash with others. Second, a feedback component issues future queries to the PTLM using known beliefs as context. We show that, in a controlled experimental setting, these two mechanisms result in more consistent beliefs in the overall system, improving both the accuracy and consistency of its answers over time. This is significant as it is a first step towards PTLM-based architectures with a systematic notion of belief, enabling them to construct a more coherent picture of the world, and improve over time without model retraining.

MCML Authors

Nora Kassner

Dr.

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[238]

N. Kees, M. Fromm, E. Faerman and T. Seidl.
Active Learning for Argument Strength Estimation.
Insights @EMNLP 2021 - 2nd Workshop on Insights from Negative Results at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). Punta Cana, Dominican Republic, Nov 07-11, 2021. DOI

Abstract

High-quality arguments are an essential part of decision-making. Automatically predicting the quality of an argument is a complex task that recently got much attention in argument mining. However, the annotation effort for this task is exceptionally high. Therefore, we test uncertainty-based active learning (AL) methods on two popular argument-strength data sets to estimate whether sample-efficient learning can be enabled. Our extensive empirical evaluation shows that uncertainty-based acquisition functions can not surpass the accuracy reached with the random acquisition on these data sets.

MCML Authors

Michael Fromm

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Database Systems and Data Mining

[237]

C. Fritz, M. Mehrl, P. W. Thurner and G. Kauermann.
The Role of Governmental Weapons Procurements in Forecasting Monthly Fatalities in Intrastate Conflicts: A Semiparametric Hierarchical Hurdle Model.
International Interactions 48.4 (Nov. 2021). DOI

Abstract

Accurate and interpretable forecasting models predicting spatially and temporally fine-grained changes in the numbers of intrastate conflict casualties are of crucial importance for policymakers and international non-governmental organizations (NGOs). Using a count data approach, we propose a hierarchical hurdle regression model to address the corresponding prediction challenge at the monthly PRIO-grid level. More precisely, we model the intensity of local armed conflict at a specific point in time as a three-stage process. Stages one and two of our approach estimate whether we will observe any casualties at the country- and grid-cell-level, respectively, while stage three applies a regression model for truncated data to predict the number of such fatalities conditional upon the previous two stages. Within this modeling framework, we focus on the role of governmental arms imports as a processual factor allowing governments to intensify or deter from fighting. We further argue that a grid cell’s geographic remoteness is bound to moderate the effects of these military buildups. Out-of-sample predictions corroborate the effectiveness of our parsimonious and theory-driven model, which enables full transparency combined with accuracy in the forecasting process.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[236]

K. T. Schmid, B. Höllbacher, C. Cruceanu, A. Böttcher, H. Lickert, E. B. Binder, F. J. Theis and M. Heinig.
scPower accelerates and optimizes the design of multi-sample single cell transcriptomic studies.
Nature Communications 12.6625 (Nov. 2021). DOI

Abstract

Single cell RNA-seq has revolutionized transcriptomics by providing cell type resolution for differential gene expression and expression quantitative trait loci (eQTL) analyses. However, efficient power analysis methods for single cell data and inter-individual comparisons are lacking. Here, we present scPower; a statistical framework for the design and power analysis of multi-sample single cell transcriptomic experiments. We modelled the relationship between sample size, the number of cells per individual, sequencing depth, and the power of detecting differentially expressed genes within cell types. We systematically evaluated these optimal parameter combinations for several single cell profiling platforms, and generated broad recommendations. In general, shallow sequencing of high numbers of cells leads to higher overall power than deep sequencing of fewer cells. The model, including priors, is implemented as an R package and is accessible as a web tool. scPower is a highly customizable tool that experimentalists can use to quickly compare a multitude of experimental designs and optimize for a limited budget.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Modelling of Biological Systems

[235]

S. Hilbert, S. Coors, E. Kraus, B. Bischl, A. Lindl, M. Frei, J. Wild, S. Krauss, D. Goretzko and C. Stachl.
Machine learning for the educational sciences.
Review of Education 9.3 (Nov. 2021). DOI

Abstract

Machine learning (ML) provides a powerful framework for the analysis of high-dimensional datasets by modelling complex relationships, often encountered in modern data with many variables, cases and potentially non-linear effects. The impact of ML methods on research and practical applications in the educational sciences is still limited, but continuously grows, as larger and more complex datasets become available through massive open online courses (MOOCs) and large-scale investigations. The educational sciences are at a crucial pivot point, because of the anticipated impact ML methods hold for the field. To provide educational researchers with an elaborate introduction to the topic, we provide an instructional summary of the opportunities and challenges of ML for the educational sciences, show how a look at related disciplines can help learning from their experiences, and argue for a philosophical shift in model evaluation. We demonstrate how the overall quality of data analysis in educational research can benefit from these methods and show how ML can play a decisive role in the validation of empirical models. Specifically, we (1) provide an overview of the types of data suitable for ML and (2) give practical advice for the application of ML methods. In each section, we provide analytical examples and reproducible R code. Also, we provide an extensive Appendix on ML-based applications for education. This instructional summary will help educational scientists and practitioners to prepare for the promises and threats that come with the shift towards digitisation and large-scale assessment in education.

MCML Authors

Stefan Coors

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Learning and Data Science

[234]

M. Herrmann and F. Scheipl.
A Geometric Perspective on Functional Outlier Detection.
Stats 4.4 (Nov. 2021). DOI

Abstract

We consider functional outlier detection from a geometric perspective, specifically: for functional datasets drawn from a functional manifold, which is defined by the data’s modes of variation in shape, translation, and phase. Based on this manifold, we developed a conceptualization of functional outlier detection that is more widely applicable and realistic than previously proposed taxonomies. Our theoretical and experimental analyses demonstrated several important advantages of this perspective: it considerably improves theoretical understanding and allows describing and analyzing complex functional outlier scenarios consistently and in full generality, by differentiating between structurally anomalous outlier data that are off-manifold and distributionally outlying data that are on-manifold, but at its margins. This improves the practical feasibility of functional outlier detection: we show that simple manifold-learning methods can be used to reliably infer and visualize the geometric structure of functional datasets. We also show that standard outlier-detection methods requiring tabular data inputs can be applied to functional data very successfully by simply using their vector-valued representations learned from manifold learning methods as the input features. Our experiments on synthetic and real datasets demonstrated that this approach leads to outlier detection performances at least on par with existing functional-data-specific methods in a large variety of settings, without the highly specialized, complex methodology and narrow domain of application these methods often entail.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Fabian Scheipl

PD Dr.

Functional Data Analysis

[233]

M. Ali, M. Berrendorf, M. Galkin, V. Thost, T. Ma, V. Tresp and J. Lehmann.
Improving Inductive Link Prediction Using Hyper-Relational Facts.
ISWC 2021 - 20th International Semantic Web Conference. Virtual, Oct 24-28, 2021. DOI GitHub

Abstract

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[232]

G. Brasó, N. Kister and L. Leal-Taixé.
The Center of Attention: Center-Keypoint Grouping Attention for Multi-Person Pose Estimation.
ICCV 2021 - IEEE/CVF International Conference on Computer Vision. Virtual, Oct 11-17, 2021. DOI GitHub

Abstract

We introduce CenterGroup, an attention-based framework to estimate human poses from a set of identity-agnostic keypoints and person center predictions in an image. Our approach uses a transformer to obtain context-aware embeddings for all detected keypoints and centers and then applies multi-head attention to directly group joints into their corresponding person centers. While most bottom-up methods rely on non-learnable clustering at inference, CenterGroup uses a fully differentiable attention mechanism that we train end-to-end together with our keypoint detector. As a result, our method obtains state-of-the-art performance with up to 2.5x faster inference time than competing bottom-up approaches.

MCML Authors

Guillem Brasó

* Former Member

Laura Leal-Taixé

Prof. Dr.

* Former Principal Investigator

[231]

S. Garg, H. Dhamo, A. Farshad, S. Musatian, N. Navab and F. Tombari.
Unconditional Scene Graph Generation.
ICCV 2021 - IEEE/CVF International Conference on Computer Vision. Virtual, Oct 11-17, 2021. DOI

Abstract

Despite recent advancements in single-domain or single-object image generation, it is still challenging to generate complex scenes containing diverse, multiple objects and their interactions. Scene graphs, composed of nodes as objects and directed-edges as relationships among objects, offer an alternative representation of a scene that is more semantically grounded than images. We hypothesize that a generative model for scene graphs might be able to learn the underlying semantic structure of real-world scenes more effectively than images, and hence, generate realistic novel scenes in the form of scene graphs. In this work, we explore a new task for the unconditional generation of semantic scene graphs. We develop a deep auto-regressive model called SceneGraphGen which can directly learn the probability distribution over labelled and directed graphs using a hierarchical recurrent architecture. The model takes a seed object as input and generates a scene graph in a sequence of steps, each step generating an object node, followed by a sequence of relationship edges connecting to the previous nodes. We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes. Additionally, we demonstrate the application of the generated graphs in image synthesis, anomaly detection and scene graph completion.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

[230]

A. Khakzar, S. Musatian, J. Buchberger, I. V. Quiroz, N. Pinger, S. Baselizadeh, S. T. Kim and N. Navab.
Towards Semantic Interpretation of Thoracic Disease and COVID-19 Diagnosis Models.
MICCAI 2021 - 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg, France, Sep 27-Oct 01, 2021. DOI GitHub

Abstract

Convolutional neural networks are showing promise in the automatic diagnosis of thoracic pathologies on chest x-rays. Their black-box nature has sparked many recent works to explain the prediction via input feature attribution methods (aka saliency methods). However, input feature attribution methods merely identify the importance of input regions for the prediction and lack semantic interpretation of model behavior. In this work, we first identify the semantics associated with internal units (feature maps) of the network. We proceed to investigate the following questions; Does a regression model that is only trained with COVID-19 severity scores implicitly learn visual patterns associated with thoracic pathologies? Does a network that is trained on weakly labeled data (e.g. healthy, unhealthy) implicitly learn pathologies? Moreover, we investigate the effect of pretraining and data imbalance on the interpretability of learned features. In addition to the analysis, we propose semantic attribution to semantically explain each prediction. We present our findings using publicly available chest pathologies (CheXpert [5], NIH ChestX-ray8 [25]) and COVID-19 datasets (BrixIA [20], and COVID-19 chest X-ray segmentation dataset [4]).

MCML Authors

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[229]

A. Khakzar, Y. Zhang, W. Mansour, Y. Cai, Y. Li, Y. Zhang, S. T. Kim and N. Navab.
Explaining COVID-19 and Thoracic Pathology Model Predictions by Identifying Informative Input Features.
MICCAI 2021 - 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg, France, Sep 27-Oct 01, 2021. DOI GitHub

Abstract

Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. In order to establish trust in the clinical routine, the networks’ prediction mechanism needs to be interpretable. One principal approach to interpretation is feature attribution. Feature attribution methods identify the importance of input features for the output prediction. Building on Information Bottleneck Attribution (IBA) method, for each prediction we identify the chest X-ray regions that have high mutual information with the network’s output. Original IBA identifies input regions that have sufficient predictive information. We propose Inverse IBA to identify all informative regions. Thus all predictive cues for pathologies are highlighted on the X-rays, a desirable property for chest X-ray diagnosis. Moreover, we propose Regression IBA for explaining regression models. Using Regression IBA we observe that a model trained on cumulative severity score labels implicitly learns the severity of different X-ray regions. Finally, we propose Multi-layer IBA to generate higher resolution and more detailed attribution/saliency maps. We evaluate our methods using both human-centric (ground-truth-based) interpretability metrics, and human-agnostic feature importance metrics on NIH Chest X-ray8 and BrixIA datasets.

MCML Authors

Ashkan Khakzar

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Yawei Li

Statistical Learning and Data Science

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[228]

D. Kazempour, A. Beer, M. Oelker, P. Kröger and T. Seidl.
Compound Segmentation via Clustering on Mol2Vec-based Embeddings.
eScience 2021 - 17th IEEE eScience Conference. Virtual, Sep 20-23, 2021. DOI

Abstract

During different steps in the process of discovering drug candidates for diseases, it can be supportive to identify groups of molecules that share similar properties, i.e. common overall structural similarity. The existing methods for computing (dis)similarities between chemical structures rely on a priori domain knowledge. Here we investigate the clustering of compounds that are applied on embeddings generated from a recently published Mol2Vec technique which enables an entirely unsupervised vector representation of compounds. A research question we address in this work is: do existent well-known clustering algorithms such as k-means or hierarchical clustering methods yield meaningful clusters on the Mol2Vec embeddings? Further, we investigate how far subspace clustering can be utilized to compress the data by reducing the dimensionality of the compounds vector representation. Our first conducted experiments on a set of COVID-19 drug candidates reveal that well-established methods yield meaningful clusters. Preliminary results from subspace clusterings indicate that a compression of the vector representations seems viable.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[227]

S. Coors, D. Schalk, B. Bischl and D. Rügamer.
Automatic Componentwise Boosting: An Interpretable AutoML System.
ADS @ECML-PKDD 2021 - Automating Data Science Workshop at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2021). Virtual, Sep 13-17, 2021. arXiv

Abstract

In practice, machine learning (ML) workflows require various different steps, from data preprocessing, missing value imputation, model selection, to model tuning as well as model evaluation. Many of these steps rely on human ML experts. AutoML - the field of automating these ML pipelines - tries to help practitioners to apply ML off-the-shelf without any expert knowledge. Most modern AutoML systems like auto-sklearn, H20-AutoML or TPOT aim for high predictive performance, thereby generating ensembles that consist almost exclusively of black-box models. This, in turn, makes the interpretation for the layperson more intricate and adds another layer of opacity for users. We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm. Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions, allows for a straightforward calculation of feature importance, and gives insights into the required model complexity to fit the given task. We introduce the general framework and outline its implementation autocompboost. To demonstrate the frameworks efficacy, we compare autocompboost to other existing systems based on the OpenML AutoML-Benchmark. Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets while being more user-friendly and transparent.

MCML Authors

Stefan Coors

* Former Member

Daniel Schalk

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[226]

S. Obermeier, A. Beer, F. Wahl and T. Seidl.
Cluster Flow — an Advanced Concept for Ensemble-Enabling, Interactive Clustering.
BTW 2021 - 19th Symposium of Database Systems for Business, Technology and Web. Dresden, Germany, Sep 13-17, 2021. DOI

Abstract

Even though most clustering algorithms serve knowledge discovery in fields other than computer science, most of them still require users to be familiar with programming or data mining to some extent. As that often prevents efficient research, we developed an easy to use, highly explainable clustering method accompanied by an interactive tool for clustering. It is based on intuitively understandable kNN graphs and the subsequent application of adaptable filters, which can be combined ensemble-like and iteratively and prune unnecessary or misleading edges. For a first overview of the data, fully automatic predefined filter cascades deliver robust results. A selection of simple filters and combination methods that can be chosen interactively yield very good results on benchmark datasets compared to various algorithms.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[225]

J. Liu, I. Chiotellis, R. Triebel and D. Cremers.
Effective Version Space Reduction for Convolutional Neural Networks.
ECML-PKDD 2021 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Virtual, Sep 13-17, 2021. DOI

Abstract

In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis. However, many methods for neural networks are hypothesis space agnostic and do not address this problem. We examine active learning with convolutional neural networks through the principled lens of version space reduction. We identify the connection between two approaches – prior mass reduction and diameter reduction – and propose a new diameter-based querying method – the minimum Gibbs-vote disagreement. By estimating version space diameter and bias, we illustrate how version space of neural networks evolves and examine the realizability assumption. With experiments on MNIST, Fashion-MNIST, SVHN and STL-10 datasets, we demonstrate that diameter reduction methods reduce the version space more effectively and perform better than prior mass reduction and other baselines, and that the Gibbs vote disagreement is on par with the best query method.

MCML Authors

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[224]

R. Sonabend, F. J. Király, A. Bender, B. Bischl and M. Lang.
mlr3proba: An R Package for Machine Learning in Survival Analysis.
Bioinformatics 37.17 (Sep. 2021). DOI

Abstract

In tasks like node classification, image segmentation, and named-entity recognition we have a classifier that simultaneously outputs multiple predictions (a vector of labels) based on a single input, i.e. a single graph, image, or document respectively. Existing adversarial robustness certificates consider each prediction independently and are thus overly pessimistic for such tasks. They implicitly assume that an adversary can use different perturbed inputs to attack different predictions, ignoring the fact that we have a single shared input. We propose the first collective robustness certificate which computes the number of predictions that are simultaneously guaranteed to remain stable under perturbation, i.e. cannot be attacked. We focus on Graph Neural Networks and leverage their locality property - perturbations only affect the predictions in a close neighborhood - to fuse multiple single-node certificates into a drastically stronger collective certificate. For example, on the Citeseer dataset our collective certificate for node classification increases the average number of certifiable feature perturbations from 7 to 351.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

* Former Member

[223]

D. Kazempour, J. Winter, P. Kröger and T. Seidl.
On Methods and Measures for the Inspection of Arbitrarily Oriented Subspace Clusters.
Datenbank-Spektrum 21 (Sep. 2021). DOI

Abstract

When using arbitrarily oriented subspace clustering algorithms one obtains a partitioning of a given data set and for each partition its individual subspace. Since clustering is an unsupervised machine learning task, we may not have “ground truth” labels at our disposal or do not wish to rely on them. What is needed in such cases are internal measure which permits a label-less analysis of the obtained subspace clustering. In this work, we propose methods for revising clusters obtained from arbitrarily oriented correlation clustering algorithms. Initial experiments conducted reveal improvements in the clustering results compared to the original clustering outcome. Our proposed approach is simple and can be applied as a post-processing step on arbitrarily oriented correlation clusterings.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[222]

T. Seidl, M. Fromm and S. Obermeier.
Proceedings of the LWDA 2021 Workshops: FGWM, KDML, FGWI-BIA, and FGIR.
LWDA 2021 - Lernen, Wissen, Daten, Analysen 2021 (Sep. 2021). URL

Abstract

LWDA 2021 is a joint conference of six special interest groups of the German Computer Science Society (GI), addressing research in the areas of knowledge discovery and machine learning, information retrieval, database systems, and knowledge management. The German acronym LWDA stands for ‘Lernen, Wissen, Daten, Analysen’ (Learning, Knowledge, Data, Analytics). Following the tradition of the last years, LWDA 2021 provides a joint forum for experienced and young researchers, to bring insights into recent trends, technologies, and applications and to promote interaction among the special interest groups.

MCML Authors

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Michael Fromm

Dr.

* Former Member

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

[221]

A. Lohrer, A. Beer, M. Hünemörder, J. Lauterbach, T. Seidl and P. Kröger.
AnyCORE - An Anytime Algorithm for Cluster Outlier REmoval.
LWDA 2021 - Conference on Lernen. Wissen. Daten. Analysen. München, Germany, Sep 01-03, 2021. PDF

Abstract

We introduce AnyCORE (Anytime Cluster Outlier REmoval), an algorithm that enables users to detect and remove outliers at anytime. The algorithm is based on the idea of MORe++, an approach for outlier detection and removal that iteratively scores and removes 1d-cluster-outliers in n-dimensional data sets. In contrast to MORe++, AnyCORE provides continuous responses for its users and converges independent of cluster centers. This allows AnyCORE to perform outlier detection in combination with an arbitrary clustering method that is most suitable for a given data set. We conducted our AnyCORE experiments on synthetic and real-world data sets by benchmarking its variant with k-Means as the underlying clustering method versus the traditional batch algorithm version of MORe++. In extensive experiments we show that AnyCORE is able to compete with the related batch algorithm version.

MCML Authors

Andreas Lohrer

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Peer Kröger

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

* Former Principal Investigator

[220]

C. Fritz, P. W. Thurner and G. Kauermann.
Separable and Semiparametric Network-based Counting Processes applied to the International Combat Aircraft Trades.
Network Science 9.3 (Sep. 2021). DOI

Abstract

We propose a novel tie-oriented model for longitudinal event network data. The generating mechanism is assumed to be a multivariate Poisson process that governs the onset and repetition of yearly observed events with two separate intensity functions. We apply the model to a network obtained from the yearly dyadic number of international deliveries of combat aircraft trades between 1950 and 2017. Based on the trade gravity approach, we identify economic and political factors impeding or promoting the number of transfers. Extensive dynamics as well as country heterogeneities require the specification of semiparametric time-varying effects as well as random effects. Our findings reveal strong heterogeneous as well as time-varying effects of endogenous and exogenous covariates on the onset and repetition of aircraft trade events.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[219]

M. Mozes, M. Schmitt, V. Golkov, H. Schütze and D. Cremers.
Scene Graph Generation for Better Image Captioning?
Preprint (Sep. 2021). arXiv

Abstract

We investigate the incorporation of visual relationships into the task of supervised image caption generation by proposing a model that leverages detected objects and auto-generated visual relationships to describe images in natural language. To do so, we first generate a scene graph from raw image pixels by identifying individual objects and visual relationships between them. This scene graph then serves as input to our graph-to-text model, which generates the final caption. In contrast to previous approaches, our model thus explicitly models the detection of objects and visual relationships in the image. For our experiments we construct a new dataset from the intersection of Visual Genome and MS COCO, consisting of images with both a corresponding gold scene graph and human-authored caption. Our results show that our methods outperform existing state-of-the-art end-to-end models that generate image descriptions directly from raw input pixels when compared in terms of the BLEU and METEOR evaluation metrics.

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[218]

F. Soleymani, M. Eslami, T. Elze, B. Bischl and M. Rezaei.
Deep Variational Clustering Framework for Self-labeling of Large-scale Medical Images.
Preprint (Sep. 2021). arXiv GitHub

Abstract

We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivariate Gaussian posterior captures the latent distribution of a large set of unlabeled images. Then, we perform unsupervised clustering on top of the variational latent space using a clustering loss. In this approach, the probabilistic decoder helps to prevent the distortion of data points in the latent space and to preserve the local structure of data generating distribution. The training process can be considered as a self-training process to refine the latent space and simultaneously optimizing cluster assignments iteratively. We evaluated our proposed framework on three public datasets that represented different medical imaging modalities. Our experimental results show that our proposed framework generalizes better across different datasets. It achieves compelling results on several medical imaging benchmarks. Thus, our approach offers potential advantages over conventional deep unsupervised learning in real-world applications.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[217]

V. Golkov.
Deep learning and variational analysis for high-dimensional and geometric biomedical data.
Dissertation 2021. URL

Abstract

In this thesis, we use deep learning and variational analysis to solve various problems from biology and medicine related to advanced data structures. We predict the structure of proteins from their evolutionary statistics, and the function of proteins and small molecules from their structure. We also present image processing methods for diffusion MRI that reduce the scan duration by a factor of twelve and improve the image quality.

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

[216]

L. Miklautz, L. G. M. Bauer, D. Mautz, S. Tschiatschek, C. Böhm and C. Plant.
Details (Don't) Matter: Isolating Cluster Information in Deep Embedded Spaces.
IJCAI 2021 - 30th International Joint Conference on Artificial Intelligence). Montreal, Canada, Aug 19-26, 2021. DOI

Abstract

Deep clustering techniques combine representation learning with clustering objectives to improve their performance. Among existing deep clustering techniques, autoencoder-based methods are the most prevalent ones. While they achieve promising clustering results, they suffer from an inherent conflict between preserving details, as expressed by the reconstruction loss, and finding similar groups by ignoring details, as expressed by the clustering loss. This conflict leads to brittle training procedures, dependence on trade-off hyperparameters and less interpretable results. We propose our framework, ACe/DeC, that is compatible with Autoencoder Centroid based Deep Clustering methods and automatically learns a latent representation consisting of two separate spaces. The clustering space captures all cluster-specific information and the shared space explains general variation in the data. This separation resolves the above mentioned conflict and allows our method to learn both detailed reconstructions and cluster specific abstractions. We evaluate our framework with extensive experiments to show several benefits: (1) cluster performance – on various data sets we outperform relevant baselines; (2) no hyperparameter tuning – this improved performance is achieved without introducing new clustering specific hyperparameters; (3) interpretability – isolating the cluster specific information in a separate space is advantageous for data exploration and interpreting the clustering results; and (4) dimensionality of the embedded space – we automatically learn a low dimensional space for clustering. Our ACe/DeC framework isolates cluster information, increases stability and interpretability, while improving cluster performance.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[215]

C. Leiber, L. G. M. Bauer, B. Schelling, C. Böhm and C. Plant.
Dip-based Deep Embedded Clustering with k-Estimation.
KDD 2021 - 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Singapore, Aug 14-18, 2021. DOI

Abstract

The combination of clustering with Deep Learning has gained much attention in recent years. Unsupervised neural networks like autoencoders can autonomously learn the essential structures in a data set. This idea can be combined with clustering objectives to learn relevant features automatically. Unfortunately, they are often based on a k-means framework, from which they inherit various assumptions, like spherical-shaped clusters. Another assumption, also found in approaches outside the k-means-family, is knowing the number of clusters a-priori. In this paper, we present the novel clustering algorithm DipDECK, which can estimate the number of clusters simultaneously to improving a Deep Learning-based clustering objective. Additionally, we can cluster complex data sets without assuming only spherically shaped clusters. Our algorithm works by heavily overestimating the number of clusters in the embedded space of an autoencoder and, based on Hartigan’s Dip-test - a statistical test for unimodality - analyses the resulting micro-clusters to determine which to merge. We show in extensive experiments the various benefits of our method: (1) we achieve competitive results while learning the clustering-friendly representation and number of clusters simultaneously; (2) our method is robust regarding parameters, stable in performance, and allows for more flexibility in the cluster shape; (3) we outperform relevant competitors in the estimation of the number of clusters.

MCML Authors

Collin Leiber

* Former Member

Christian Böhm

Prof. Dr.

* Former Member

[214]

B. Busam.
High Performance Visual Pose Computation.
Dissertation 2021. URL

Abstract

An outside-in system uses binocular stereo and a probabilistic sparse point cloud matcher to track objects with micrometre precision in real-time. Miniaturizing the system results in a markerless inside-out stereo method with improved rotational accuracy. Reducing the constraints, we reformulate marker-free monocular pose estimation as an action decision process where the next best pose is determined using a render-and-compare strategy. This allows instance agnostic pose estimation that generalizes to unseen objects. The methods are applied on a set of medical and industrial applications.

MCML Authors

Benjamin Busam

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Aided Medical Procedures & Augmented Reality

[213]

A. Imani, M. J. Sabet, P. Dufter, M. Cysouw and H. Schütze.
ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus.
ACL-IJCNLP 2021 - Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Bangkok, Thailand, Aug 01-06, 2021. DOI

Abstract

With more than 7000 languages worldwide, multilingual natural language processing (NLP) is essential both from an academic and commercial perspective. Researching typological properties of languages is fundamental for progress in multilingual NLP. Examples include assessing language similarity for effective transfer learning, injecting inductive biases into machine learning models or creating resources such as dictionaries and inflection tables. We provide ParCourE, an online tool that allows to browse a word-aligned parallel corpus, covering 1334 languages. We give evidence that this is useful for typological research. ParCourE can be set up for any parallel corpus and can thus be used for typological research on other corpora as well as for exploring their quality and properties.

MCML Authors

Ayyoob Imani

Computational Linguistics

Masoud Jalili Sabet

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[212]

C. M. Verdun, T. Fuchs, P. Harar, D. Elbrächter, D. S. Fischer, J. Berner, P. Grohs, F. J. Theis and F. Krahmer.
Group Testing for SARS-CoV-2 Allows for Up to 10-Fold Efficiency Increase Across Realistic Scenarios and Testing Strategies.
Frontiers in Public Health 9 (Aug. 2021). DOI

Abstract

Background: Due to the ongoing COVID-19 pandemic, demand for diagnostic testing has increased drastically, resulting in shortages of necessary materials to conduct the tests and overwhelming the capacity of testing laboratories. The supply scarcity and capacity limits affect test administration: priority must be given to hospitalized patients and symptomatic individuals, which can prevent the identification of asymptomatic and presymptomatic individuals and hence effective tracking and tracing policies. We describe optimized group testing strategies applicable to SARS-CoV-2 tests in scenarios tailored to the current COVID-19 pandemic and assess significant gains compared to individual testing.
Methods: We account for biochemically realistic scenarios in the context of dilution effects on SARS-CoV-2 samples and consider evidence on specificity and sensitivity of PCR-based tests for the novel coronavirus. Because of the current uncertainty and the temporal and spatial changes in the prevalence regime, we provide analysis for several realistic scenarios and propose fast and reliable strategies for massive testing procedures.
Key Findings: We find significant efficiency gaps between different group testing strategies in realistic scenarios for SARS-CoV-2 testing, highlighting the need for an informed decision of the pooling protocol depending on estimated prevalence, target specificity, and high- vs. low-risk population. For example, using one of the presented methods, all 1.47 million inhabitants of Munich, Germany, could be tested using only around 141 thousand tests if the infection rate is below 0.4% is assumed. Using 1 million tests, the 6.69 million inhabitants from the city of Rio de Janeiro, Brazil, could be tested as long as the infection rate does not exceed 1%. Moreover, we provide an interactive web application, available at www.group-testing.com, for visualizing the different strategies and designing pooling schemes according to specific prevalence scenarios and test configurations.
Interpretation: Altogether, this work may help provide a basis for an efficient upscaling of current testing procedures, which takes the population heterogeneity into account and is fine-grained towards the desired study populations, e.g., mild/asymptomatic individuals vs. symptomatic ones but also mixtures thereof.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[211]

D. S. Fischer, L. Dony, M. König, A. Moeed, L. Zappia, L. Heumos, S. Tritschler, O. Holmberg, H. Aliee and F. J. Theis.
Sfaira accelerates data and model reuse in single cell genomics.
Genome Biology 22.248 (Aug. 2021). DOI

Abstract

Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[210]

M. P. Fabritius, M. Seidensticker, J. Rueckel, C. Heinze, M. Pech, K. J. Paprottka, P. M. Paprottka, J. Topalis, A. Bender, J. Ricke, A. Mittermeier and M. Ingrisch.
Bi-Centric Independent Validation of Outcome Prediction after Radioembolization of Primary and Secondary Liver Cancer.
Journal of Clinical Medicine 10.16 (Aug. 2021). DOI

Abstract

Background: Yttrium-90 radioembolization (RE) plays an important role in the treatment of liver malignancies. Optimal patient selection is crucial for an effective and safe treatment. In this study, we aim to validate the prognostic performance of a previously established random survival forest (RSF) with an external validation cohort from a different national center. Furthermore, we compare outcome prediction models with different established metrics. Methods: A previously established RSF model, trained on a consecutive cohort of 366 patients who had received RE due to primary or secondary liver tumor at a national center (center 1), was used to predict the outcome of an independent consecutive cohort of 202 patients from a different national center (center 2) and vice versa. Prognostic performance was evaluated using the concordance index (C-index) and the integrated Brier score (IBS). The prognostic importance of designated baseline parameters was measured with the minimal depth concept, and the influence on the predicted outcome was analyzed with accumulated local effects plots. RSF values were compared to conventional cox proportional hazards models in terms of C-index and IBS. Results: The established RSF model achieved a C-index of 0.67 for center 2, comparable to the results obtained for center 1, which it was trained on (0.66). The RSF model trained on center 2 achieved a C-index of 0.68 on center 2 data and 0.66 on center 1 data. CPH models showed comparable results on both cohorts, with C-index ranging from 0.68 to 0.72. IBS validation showed more differentiated results depending on which cohort was trained on and which cohort was predicted (range: 0.08 to 0.20). Baseline cholinesterase was the most important variable for survival prediction. Conclusion: The previously developed predictive RSF model was successfully validated with an independent external cohort. C-index and IBS are suitable metrics to compare outcome prediction models, with IBS showing more differentiated results. The findings corroborate that survival after RE is critically determined by functional hepatic reserve and thus baseline liver function should play a key role in patient selection.

MCML Authors

Johanna Topalis

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[209]

V. Bergen, R. A. Soldatov, P. V. Kharchenko and F. J. Theis.
RNA velocity—current challenges and future perspectives.
Molecular Systems Biology 17.e10282 (Aug. 2021). DOI

Abstract

RNA velocity has enabled the recovery of directed dynamic information from single‐cell transcriptomics by connecting measurements to the underlying kinetics of gene expression. This approach has opened up new ways of studying cellular dynamics. Here, we review the current state of RNA velocity modeling approaches, discuss various examples illustrating limitations and potential pitfalls, and provide guidance on how the ensuing challenges may be addressed. We then outline future directions on how to generalize the concept of RNA velocity to a wider variety of biological systems and modalities.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[208]

H. Seibold, A. Charlton, A.-L. Boulesteix and S. Hoffmann.
Statisticians, Roll Up Your Sleeves! There's A Crisis to be Solved.
Significance 18.4 (Aug. 2021). DOI

Abstract

Statisticians play a key role in almost all scientific research. As such, they may be key to solving the reproducibility crisis. Heidi Seibold, Alethea Charlton, Anne-Laure Boulesteix and Sabine Hoffmann urge statisticians to take an active role in promoting more credible science.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[207]

F. Pfisterer, C. Kern, S. Dandl, M. Sun, M. P. Kim and B. Bischl.
mcboost: Multi-Calibration Boosting for R.
The Journal of Open Source Software 6.64 (Aug. 2021). DOI

Abstract

Given the increasing usage of automated prediction systems in the context of high-stakes de- cisions, a growing body of research focuses on methods for detecting and mitigating biases in algorithmic decision-making. One important framework to audit for and mitigate biases in predictions is that of Multi-Calibration, introduced by Hebert-Johnson et al. (2018). The underlying fairness notion, Multi-Calibration, promotes the idea of multi-group fairness and requires calibrated predictions not only for marginal populations, but also for subpopulations that may be defined by complex intersections of many attributes. A simpler variant of Multi- Calibration, referred to as Multi-Accuracy, requires unbiased predictions for large collections of subpopulations. Hebert-Johnson et al. (2018) proposed a boosting-style algorithm for learning multi-calibrated predictors. Kim et al. (2019) demonstrated how to turn this al- gorithm into a post-processing strategy to achieve multi-accuracy, demonstrating empirical effectiveness across various domains. This package provides a stable implementation of the multi-calibration algorithm, called MCBoost. In contrast to other Fair ML approaches, MC- Boost does not harm the overall utility of a prediction model, but rather aims at improving calibration and accuracy for large sets of subpopulations post-training. MCBoost comes with strong theoretical guarantees, which have been explored formally in Hebert-Johnson et al. (2018), Kim et al. (2019), Dwork et al. (2019), Dwork et al. (2020) and Kim et al. (2021).

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Learning and Data Science

[206]

A. Bauer, F. Scheipl and H. Küchenhoff.
Registration for Incomplete Non-Gaussian Functional Data.
Preprint (Aug. 2021). arXiv

Abstract

Accounting for phase variability is a critical challenge in functional data analysis. To separate it from amplitude variation, functional data are registered, i.e., their observed domains are deformed elastically so that the resulting functions are aligned with template functions. At present, most available registration approaches are limited to datasets of complete and densely measured curves with Gaussian noise. However, many real-world functional data sets are not Gaussian and contain incomplete curves, in which the underlying process is not recorded over its entire domain. In this work, we extend and refine a framework for joint likelihood-based registration and latent Gaussian process-based generalized functional principal component analysis that is able to handle incomplete curves. Our approach is accompanied by sophisticated open-source software, allowing for its application in diverse non-Gaussian data settings and a public code repository to reproduce all results. We register data from a seismological application comprising spatially indexed, incomplete ground velocity time series with a highly volatile Gamma structure. We describe, implement and evaluate the approach for such incomplete non-Gaussian functional data and compare it to existing routines.

MCML Authors

Alexander Bauer

* Former Member

Fabian Scheipl

PD Dr.

Functional Data Analysis

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[205]

Y. Wang, Y. Shen and D. Cremers.
Explicit pairwise factorized graph neural network for semi-supervised node classification.
UAI 2021 - Conference on Uncertainty in Artificial Intelligence. Virtual, Jul 27-29, 2021. URL

Abstract

Node features and structural information of a graph are both crucial for semi-supervised node classification problems. A variety of graph neural network (GNN) based approaches have been proposed to tackle these problems, which typically determine output labels through feature aggregation. This can be problematic, as it implies conditional independence of output nodes given hidden representations, despite their direct connections in the graph. To learn the direct influence among output nodes in a graph, we propose the Explicit Pairwise Factorized Graph Neural Network (EPFGNN), which models the whole graph as a partially observed Markov Random Field. It contains explicit pairwise factors to model output-output relations and uses a GNN backbone to model input-output relations. To balance model complexity and expressivity, the pairwise factors have a shared component and a separate scaling coefficient for each edge. We apply the EM algorithm to train our model, and utilize a star-shaped piecewise likelihood for the tractable surrogate objective. We conduct experiments on various datasets, which shows that our model can effectively improve the performance for semi-supervised node classification on graphs.

MCML Authors

Yuesong Shen

* Former Member

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[204]

J. Moosbauer, J. Herbinger, G. Casalicchio, M. Lindauer and B. Bischl.
Towards Explaining Hyperparameter Optimization via Partial Dependence Plots.
AutoML @ICML 2021 - 8th Workshop on Automated Machine Learning at the 38th International Conference on Machine Learning (ICML 2021). Virtual, Jul 18-24, 2021. URL

Abstract

Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of comprehensibility and transparency makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO and especially discuss the popular case of Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, like Partial Dependence Plots (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. In addition, we propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions.

MCML Authors

Julia Moosbauer

Dr.

* Former Member

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[203]

M. Biloš and S. Günnemann.
Scalable Normalizing Flows for Permutation Invariant Densities.
ICML 2021 - 38th International Conference on Machine Learning. Virtual, Jul 18-24, 2021. URL

Abstract

Modeling sets is an important problem in machine learning since this type of data can be found in many domains. A promising approach defines a family of permutation invariant densities with continuous normalizing flows. This allows us to maximize the likelihood directly and sample new realizations with ease. In this work, we demonstrate how calculating the trace, a crucial step in this method, raises issues that occur both during training and inference, limiting its practicality. We propose an alternative way of defining permutation equivariant transformations that give closed form trace. This leads not only to improvements while training, but also to better final performance. We demonstrate the benefits of our approach on point processes and general set modeling.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[202]

T. Frerix, D. Kochkov, J. Smith, D. Cremers, M. Brenner and S. Hoyer.
Variational Data Assimilation with a Learned Inverse Observation Operator.
ICML 2021 - 38th International Conference on Machine Learning. Virtual, Jul 18-24, 2021. Spotlight Presentation. URL

Abstract

Variational data assimilation optimizes for an initial state of a dynamical system such that its evolution fits observational data. The physical model can subsequently be evolved into the future to make predictions. This principle is a cornerstone of large scale forecasting applications such as numerical weather prediction. As such, it is implemented in current operational systems of weather forecasting agencies across the globe. However, finding a good initial state poses a difficult optimization problem in part due to the non-invertible relationship between physical states and their corresponding observations. We learn a mapping from observational data to physical states and show how it can be used to improve optimizability. We employ this mapping in two ways: to better initialize the non-convex optimization problem, and to reformulate the objective function in better behaved physics space instead of observation space. Our experimental results for the Lorenz96 model and a two-dimensional turbulent fluid flow demonstrate that this procedure significantly improves forecast quality for chaotic systems.

MCML Authors

Daniel Cremers

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Vision & Artificial Intelligence

[201]

G. König, T. Freiesleben and M. Grosse-Wentrup.
A causal perspective on meaningful and robust algorithmic recourse.
ICML 2021 - Workshop on Algorithmic Recourse at the 38th International Conference on Machine Learning. Virtual, Jul 18-24, 2021. URL

Abstract

Algorithmic recourse explanations inform stakeholders on how to act to revert unfavorable predictions. However, in general ML models do not predict well in interventional distributions. Thus, an action that changes the prediction in the desired way may not lead to an improvement of the underlying target. Such recourse is neither meaningful nor robust to model refits. Extending the work of Karimi et al. (2021), we propose meaningful algorithmic recourse (MAR) that only recommends actions that improve both prediction and target. We justify this selection constraint by highlighting the differences between model audit and meaningful, actionable recourse explanations. Additionally, we introduce a relaxation of MAR called effective algorithmic recourse (EAR), which, under certain assumptions, yields meaningful recourse by only allowing interventions on causes of the target.

MCML Authors

Gunnar König

Dr.

* Former Member

Moritz Grosse-Wentrup

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Principal Investigator

[200]

P. Gijsbers, F. Pfisterer, J. N. van Rijn, B. Bischl and J. Vanschoren.
Meta-Learning for Symbolic Hyperparameter Defaults.
GECCO 2021 - Genetic and Evolutionary Computation Conference. Lile, France, Jul 10-14, 2021. DOI

Abstract

Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data, usually formulated as a black-box optimization problem. In this work, we propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset. This enables a much faster, but still data-dependent, configuration of the ML algorithm, compared to standard hyperparameter optimization approaches. In the past, symbolic and static default values have usually been obtained as hand-crafted heuristics. We propose an approach of learning such symbolic configurations as formulas of dataset properties from a large set of prior evaluations on multiple datasets by optimizing over a grammar of expressions using an evolutionary algorithm. We evaluate our method on surrogate empirical performance models as well as on real data across 6 ML algorithms on more than 100 datasets and demonstrate that our method indeed finds viable symbolic defaults.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[199]

F. Pfisterer, J. N. van Rijn, P. Probst, A. C. Müller and B. Bischl.
Learning Multiple Defaults for Machine Learning Algorithms.
GECCO 2021 - Genetic and Evolutionary Computation Conference. Lile, France, Jul 10-14, 2021. DOI

Abstract

Modern machine learning methods highly depend on their hyper-parameter configurations for optimal performance. A widely used approach to selecting a configuration is using default settings, often proposed along with the publication of a new algorithm. Those default values are usually chosen in an ad-hoc manner to work on a wide variety of datasets. Different automatic hyperparameter configuration algorithms which select an optimal configuration per dataset have been proposed, but despite its importance, tuning is often skipped in applications because of additional run time, complexity, and experimental design questions. Instead, the learner is often applied in its defaults. This principled approach usually improves performance but adds additional algorithmic complexity and computational costs to the training procedure. We propose and study using a set of complementary default values, learned from a large database of prior empirical results as an alternative. Selecting an appropriate configuration on a new dataset then requires only a simple, efficient, and embarrassingly parallel search over this set. To demonstrate the effectiveness and efficiency of the approach, we compare learned sets of configurations to random search and Bayesian optimization. We show that sets of defaults can improve performance while being easy to deploy in comparison to more complex methods.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[198]

D. S. Fischer, M. Ansari, K. I. Wagner, S. Jarosch, Y. , C. H. Mayr, M. Lang, E. D’Ippolito, M. Hammel, L. Mateyka, S. Weber, L. S. Wolff, K. Witter, I. E. Fernandez, G. Leuschner, K. Milger, M. Frankenberger, L. Nowak, K. Heinig-Menhard, I. Koch, M. G. Stoleriu, A. Hilgendorff, J. Behr, A. Pichlmair, B. Schubert, F. J. Theis, D. H. Busch, H. B. Schiller and K. Schober.
Single-cell RNA sequencing reveals ex vivo signatures of SARS-CoV-2-reactive T cells through ‘reverse phenotyping’.
Nature Communications 12.1 (Jul. 2021). DOI

Abstract

The in vivo phenotypic profile of T cells reactive to severe acute respiratory syndrome (SARS)-CoV-2 antigens remains poorly understood. Conventional methods to detect antigen-reactive T cells require in vitro antigenic re-stimulation or highly individualized peptide-human leukocyte antigen (pHLA) multimers. Here, we use single-cell RNA sequencing to identify and profile SARS-CoV-2-reactive T cells from Coronavirus Disease 2019 (COVID-19) patients. To do so, we induce transcriptional shifts by antigenic stimulation in vitro and take advantage of natural T cell receptor (TCR) sequences of clonally expanded T cells as barcodes for ‘reverse phenotyping’. This allows identification of SARS-CoV-2-reactive TCRs and reveals phenotypic effects introduced by antigen-specific stimulation. We characterize transcriptional signatures of currently and previously activated SARS-CoV-2-reactive T cells, and show correspondence with phenotypes of T cells from the respiratory tract of patients with severe disease in the presence or absence of virus in independent cohorts. Reverse phenotyping is a powerful tool to provide an integrated insight into cellular states of SARS-CoV-2-reactive T cells across tissues and activation states.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Modelling of Biological Systems

[197]

A. Python, A. Bender, A. K. Nandi, P. A. Hancock, R. Arambepola, J. Brandsch and T. C. D. Lucas.
Predicting non-state terrorism worldwide.
Science Advances 7.31 (Jul. 2021). DOI

Abstract

Several thousand people die every year worldwide because of terrorist attacks perpetrated by non-state actors. In this context, reliable and accurate short-term predictions of non-state terrorism at the local level are key for policy makers to target preventative measures. Using only publicly available data, we show that predictive models that include structural and procedural predictors can accurately predict the occurrence of non-state terrorism locally and a week ahead in regions affected by a relatively high prevalence of terrorism. In these regions, theoretically informed models systematically outperform models using predictors built on past terrorist events only. We further identify and interpret the local effects of major global and regional terrorism drivers. Our study demonstrates the potential of theoretically informed models to predict and explain complex forms of political violence at policy-relevant scales.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[196]

M. Aygun, A. Ošep, M. Weber, M. Maximov, C. Stachniss, J. Behley and L. Leal-Taixé.
4D Panoptic LiDAR Segmentation.
CVPR 2021 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual, Jun 19-25, 2021. DOI GitHub

Abstract

Temporal semantic scene understanding is critical for self-driving cars or robots operating in dynamic environments. In this paper, we propose 4D panoptic LiDAR segmentation to assign a semantic class and a temporally-consistent instance ID to a sequence of 3D points. To this end, we present an approach and a point-centric evaluation metric. Our approach determines a semantic class for every point while modeling object instances as probability distributions in the 4D spatio-temporal domain. We process multiple point clouds in parallel and resolve point-to-instance associations, effectively alleviating the need for explicit temporal data association. Inspired by recent advances in benchmarking of multi-object tracking, we propose to adopt a new evaluation metric that separates the semantic and point-to-instance association aspects of the task. With this work, we aim at paving the road for future developments of temporal LiDAR panoptic perception.

MCML Authors

Mark Weber

Computer Vision & Artificial Intelligence

Laura Leal-Taixé

Prof. Dr.

* Former Principal Investigator

[195]

M. Eisenberger, D. Novotny, G. Kerchenbaum, P. Labatut, N. Neverova, D. Cremers and A. Vedaldi.
NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go.
CVPR 2021 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual, Jun 19-25, 2021. DOI GitHub

Abstract

We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i.e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them. The interpolation, expressed as a deformation field, changes the pose of the source shape to resemble the target, but leaves the object identity unchanged. NeuroMorph uses an elegant architecture combining graph convolutions with global feature pooling to extract local features. During training, the model is incentivized to create realistic deformations by approximating geodesics on the underlying shape space manifold. This strong geometric prior allows to train our model end-to-end and in a fully unsupervised manner without requiring any manual correspondence annotations. NeuroMorph works well for a large variety of input shapes, including non-isometric pairs from different object categories. It obtains state-of-the-art results for both shape correspondence and interpolation tasks, matching or surpassing the performance of recent unsupervised and supervised methods on multiple benchmarks.

MCML Authors

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[194]

M. Gao, Z. Lähner, J. Thunberg, D. Cremers and F. Bernard.
Isometric Multi-Shape Matching.
CVPR 2021 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual, Jun 19-25, 2021. DOI GitHub

Abstract

Finding correspondences between shapes is a fundamental problem in computer vision and graphics, which is relevant for many applications, including 3D reconstruction, object tracking, and style transfer. The vast majority of correspondence methods aim to find a solution between pairs of shapes, even if multiple instances of the same class are available. While isometries are often studied in shape correspondence problems, they have not been considered explicitly in the multi-matching setting. This paper closes this gap by proposing a novel optimisation formulation for isometric multi-shape matching. We present a suitable optimisation algorithm for solving our formulation and provide a convergence and complexity analysis. Our algorithm obtains multi-matchings that are by construction provably cycle-consistent. We demonstrate the superior performance of our method on various datasets and set the new state-of-the-art in isometric multi-shape matching.

MCML Authors

Maolin Gao

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[193]

A. Khakzar, S. Baselizadeh, S. Khanduja, C. Rupprecht, S. T. Kim and N. Navab.
Neural Response Interpretation through the Lens of Critical Pathways.
CVPR 2021 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual, Jun 19-25, 2021. DOI

Abstract

Is critical input information encoded in specific sparse pathways within the neural network? In this work, we discuss the problem of identifying these critical pathways and subsequently leverage them for interpreting the network’s response to an input. The pruning objective — selecting the smallest group of neurons for which the response remains equivalent to the original network — has been previously proposed for identifying critical pathways. We demonstrate that sparse pathways derived from pruning do not necessarily encode critical input information. To ensure sparse pathways include critical fragments of the encoded input information, we propose pathway selection via neurons’ contribution to the response. We proceed to explain how critical pathways can reveal critical input features. We prove that pathways selected via neuron contribution are locally linear (in an ℓ 2 -ball), a property that we use for proposing a feature attribution method: ‘pathway gradient’. We validate our interpretation method using mainstream evaluation experiments. The validation of pathway gradient interpretation method further confirms that selected pathways using neuron contributions correspond to critical input features. The code 1 2 is publicly available.

MCML Authors

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[192]

C. Tomani, S. Gruber, M. E. Erdem, D. Cremers and F. Buettner.
Post-hoc Uncertainty Calibration for Domain Drift Scenarios.
CVPR 2021 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual, Jun 19-25, 2021. DOI

Abstract

We address the problem of uncertainty calibration. While standard deep neural networks typically yield uncalibrated predictions, calibrated confidence scores that are representative of the true likelihood of a prediction can be achieved using post-hoc calibration methods. However, to date, the focus of these approaches has been on in-domain calibration. Our contribution is two-fold. First, we show that existing post-hoc calibration methods yield highly over-confident predictions under domain shift. Second, we introduce a simple strategy where perturbations are applied to samples in the validation set before performing the post-hoc calibration step. In extensive experiments, we demonstrate that this perturbation step results in substantially better calibration under domain shift on a wide range of architectures and modelling tasks.

MCML Authors

Christian Tomani

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[191]

T. Yenamandra, A. Tewari, F. Bernard, H.-P. Seidel, M. Elgharib and D. Cremers.
i3DMM: Deep Implicit 3D Morphable Model of Human Heads.
CVPR 2021 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual, Jun 19-25, 2021. DOI

Abstract

We present the first deep implicit 3D morphable model (i3DMM) of full heads. Unlike earlier morphable face models it not only captures identity-specific geometry, texture, and expressions of the frontal face, but also models the entire head, including hair. We collect a new dataset consisting of 64 people with different expressions and hairstyles to train i3DMM. Our approach has the following favorable properties: (i) It is the first full head morphable model that includes hair. (ii) In contrast to mesh-based models it can be trained on merely rigidly aligned scans, without requiring difficult non-rigid registration. (iii) We design a novel architecture to decouple the shape model into an implicit reference shape and a deformation of this reference shape. With that, dense correspondences between shapes can be learned implicitly. (iv) This architecture allows us to semantically disentangle the geometry and color components, as color is learned in the reference space. Geometry is further disentangled as identity, expressions, and hairstyle, while color is disentangled as identity and hairstyle components. We show the merits of i3DMM using ablation studies, comparisons to state-of-the-art models, and applications such as semantic head editing and texture transfer. We will make our model publicly available1.

MCML Authors

Tarun Yenamandra

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Computer Vision & Artificial Intelligence

[190]

N. Strauß, L. Rottkamp, S. Schmoll and M. Schubert.
Efficient Parking Search using Shared Fleet Data.
MDM 2021 - 22nd IEEE International Conference on Mobile Data Management. Virtual, Jun 15-18, 2021. DOI

Abstract

Finding an available on-street parking spot is a relevant problem of day-to-day life. In recent years, several cities began providing real-time parking occupancy data. Finding a free parking spot in such a smart environment can be modeled and solved as a Markov decision process (MDP). The solver has to consider uncertainty as available parking spots might not remain available until arrival due to other vehicles claiming spots in the meantime. Knowing the parking intention of every vehicle in the environment would eliminate this uncertainty but is currently not realistic. In contrast, acquiring data from a subset of vehicles appears feasible and could at least reduce uncertainty.In this paper, we examine how sharing data within a vehicle fleet might lower parking search times. We use this data to better estimate the availability of parking spots at arrival. Since optimal solutions for large scenarios are computationally infeasible, we base our methods on approximations shown to perform well in single-agent settings. Our evaluation features a simulation of a part of Melbourne and indicates that fleet data can significantly reduce the time spent searching for a free parking bay.

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Lukas Rottkamp

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Spatial Artificial Intelligence

[189]

P. Dufter, N. Kassner and H. Schütze.
Static Embeddings as Efficient Knowledge Bases?
NAACL 2021 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Virtual, Jun 06-11, 2021. DOI

Abstract

Recent research investigates factual knowledge stored in large pretrained language models (PLMs). Instead of structural knowledge base (KB) queries, masked sentences such as ‘Paris is the capital of [MASK]’ are used as probes. The good performance on this analysis task has been interpreted as PLMs becoming potential repositories of factual knowledge. In experiments across ten linguistically diverse languages, we study knowledge contained in static embeddings. We show that, when restricting the output space to a candidate set, simple nearest neighbor matching using static embeddings performs better than PLMs. E.g., static embeddings perform 1.6% points better than BERT while just using 0.3% of energy for training. One important factor in their good comparative performance is that static embeddings are standardly learned for a large vocabulary. In contrast, BERT exploits its more sophisticated, but expensive ability to compose meaningful representations from a much smaller subword vocabulary.

MCML Authors

Nora Kassner

Dr.

* Former Member

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Computational Linguistics

[188]

A. M. Keppler, K. Küßner, A.-L. Schulze, E. M. Suero, C. Neuerburg, M. Weigert, C. Braun, W. Böcker, C. Kammerlander and C. Zeckey.
Radiographic cortical thickness parameters as predictors of rotational alignment in proximal tibial shaft fractures: a cadaveric study.
BMC Musculoskeletal Disorders 22.590 (Jun. 2021). DOI

Abstract

Aim: The treatment of tibial fractures with an intramedullary nail is an established procedure. However, torsional control remains challenging using intraoperatively diagnostic tools. Radiographic tools such as the Cortical Step Sign (CSS) and the Diameter Difference Sign (DDS) may serve as tools for diagnosing a relevant malrotation. The aim of this study was to investigate the effect of torsional malalignment on CSS and DDS parameters and to construct a prognostic model to detect malalignment.
Methods: A proximal tibial shaft fracture was set in human tibiae. Torsion was set stepwise from 0° to 30° in external and internal torsion. Images were obtained with a C-arm and transferred to a PC for measuring the medical cortical thickness (MCT), lateral cortical thickness (LCT), tibial diameter (TD) in AP and the anterior cortical thickness (ACT) as well as the posterior cortical thickness (PCT) and the transverse diameter (TD) of the proximal and the distal main fragment.
Results: There were significant differences between the various degrees of torsion for each of the absolute values of the examined variables. The parameters with the highest correlation were TD, LCT and ACT. A model combining ACT, LCT, PCT and TD lateral was most suitable model in identifying torsional malalignment. The best prediction of clinically relevant torsional malalignment, namely 15°, was obtained with the TD and the ACT.
Conclusion: This study shows that the CSS and DDS are useful tools for the intraoperative detection of torsional malalignment in proximal tibial shaft fractures and should be used to prevent maltorsion.

MCML Authors

Maximilian Weigert

* Former Member

[187]

Y. Ji, M. Lotfollahi, F. A. W. F. Alexander Wolf and F. J. Theis.
Machine learning for perturbational single-cell omics.
Cell Systems 12.6 (Jun. 2021). DOI GitHub

Abstract

Cell biology is fundamentally limited in its ability to collect complete data on cellular phenotypes and the wide range of responses to perturbation. Areas such as computer vision and speech recognition have addressed this problem of characterizing unseen or unlabeled conditions with the combined advances of big data, deep learning, and computing resources in the past 5 years. Similarly, recent advances in machine learning approaches enabled by single-cell data start to address prediction tasks in perturbation response modeling. We first define objectives in learning perturbation response in single-cell omics; survey existing approaches, resources, and datasets (https://github.com/theislab/sc-pert); and discuss how a perturbation atlas can enable deep learning models to construct an informative perturbation latent space. We then examine future avenues toward more powerful and explainable modeling using deep neural networks, which enable the integration of disparate information sources and an understanding of heterogeneous, complex, and unseen systems.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Modelling of Biological Systems

[186]

M. Binder, F. Pfisterer, M. Lang, L. Schneider, L. Kotthoff and B. Bischl.
mlr3pipelines - Flexible Machine Learning Pipelines in R.
Journal of Machine Learning Research 22.184 (Jun. 2021). URL

Abstract

Recent years have seen a proliferation of ML frameworks. Such systems make ML accessible to non-experts, especially when combined with powerful parameter tuning and AutoML techniques. Modern, applied ML extends beyond direct learning on clean data, however, and needs an expressive language for the construction of complex ML workflows beyond simple pre- and post-processing. We present mlr3pipelines, an R framework which can be used to define linear and complex non-linear ML workflows as directed acyclic graphs. The framework is part of the mlr3 ecosystem, leveraging convenient resampling, benchmarking, and tuning components.

MCML Authors

Martin Binder

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[185]

D. S. Fischer, A. C. Schaar and F. J. Theis.
Learning cell communication from spatial graphs of cells.
Preprint (Jun. 2021). DOI

Abstract

Tissue niches are sources of cellular variation and key to understanding both single-cell and tissue phenotypes. The interaction of a cell with its niche can be described through cell communication events. These events cannot be directly observed in molecular profiling assays of single cells and have to be inferred. However, computational models of cell communication and variance attribution defined on data from dissociated tissues suffer from multiple limitations with respect to their ability to define and to identify communication events. We address these limitations using spatial molecular profiling data with node-centric expression modeling (NCEM), a computational method based on graph neural networks which reconciles variance attribution and communication modeling in a single model of tissue niches. We use these models in varying complexity across spatial assays, such as immunohistochemistry and MERFISH, and biological systems to demonstrate that the statistical cell–cell dependencies discovered by NCEM are plausible signatures of known molecular processes underlying cell communication. We identify principles of tissue organisation as cell communication events across multiple datasets using interpretation mechanisms. In the primary motor cortex, we found gene expression variation that is due to niche composition variation across cortical depth. Using the same approach, we also identified niche-dependent cell state variation in CD8 T cells from inflamed colon and colorectal cancer. Finally, we show that NCEMs can be extended to mixed models of explicit cell communication events and latent intrinsic sources of variation in conditional variational autoencoders to yield holistic models of cellular variation in spatial molecular profiling data. Altogether, this graphical model of cellular niches is a step towards understanding emergent tissue phenotypes.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Modelling of Biological Systems

[184]

G. König, T. Freiesleben, B. Bischl, G. Casalicchio and M. Grosse-Wentrup.
Decomposition of Global Feature Importance into Direct and Associative Components (DEDACT).
Preprint (Jun. 2021). arXiv

Abstract

Global model-agnostic feature importance measures either quantify whether features are directly used for a model’s predictions (direct importance) or whether they contain prediction-relevant information (associative importance). Direct importance provides causal insight into the model’s mechanism, yet it fails to expose the leakage of information from associated but not directly used variables. In contrast, associative importance exposes information leakage but does not provide causal insight into the model’s mechanism. We introduce DEDACT - a framework to decompose well-established direct and associative importance measures into their respective associative and direct components. DEDACT provides insight into both the sources of prediction-relevant information in the data and the direct and indirect feature pathways by which the information enters the model. We demonstrate the method’s usefulness on simulated examples.

MCML Authors

Gunnar König

Dr.

* Former Member

Timo Freiesleben

Dr.

A2 | Mathematical Foundations
→ Group Tom Sterkenburg

Munich Center for Mathematical Philosophy

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

[183]

M. Gladkova, R. Wang, N. Zeller and D. Cremers.
Tight Integration of Feature-based Relocalization in Monocular Direct Visual Odometry.
ICRA 2021 - IEEE International Conference on Robotics and Automation. Xi’an, China, May 30-Jun 05, 2021. DOI

Abstract

In this paper we propose a framework for inte-grating map-based relocalization into online direct visual odometry. To achieve map-based relocalization for direct methods, we integrate image features into Direct Sparse Odometry (DSO) and rely on feature matching to associate online visual odometry (VO) with a previously built map. The integration of the relocalization poses is threefold. Firstly, they are incorporated as pose priors in the direct image alignment of the front-end tracking. Secondly, they are tightly integrated into the back-end bundle adjustment. Thirdly, an online fusion module is further proposed to combine relative VO poses and global relocalization poses in a pose graph to estimate keyframe-wise smooth and globally accurate poses. We evaluate our method on two multi-weather datasets showing the benefits of integrating different handcrafted and learned features and demonstrating promising improvements on camera tracking accuracy.

MCML Authors

Mariia Gladkova

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[182]

P. Wenzel, T. Schön, L. Leal-Taixé and D. Cremers.
Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning.
ICRA 2021 - IEEE International Conference on Robotics and Automation. Xi’an, China, May 30-Jun 05, 2021. DOI

Abstract

Obstacle avoidance is a fundamental and challenging problem for autonomous navigation of mobile robots. In this paper, we consider the problem of obstacle avoidance in simple 3D environments where the robot has to solely rely on a single monocular camera. In particular, we are interested in solving this problem without relying on localization, mapping, or planning techniques. Most of the existing work consider obstacle avoidance as two separate problems, namely obstacle detection, and control. Inspired by the recent advantages of deep reinforcement learning in Atari games and understanding highly complex situations in Go, we tackle the obstacle avoidance problem as a data-driven end-to-end deep learning approach. Our approach takes raw images as input and generates control commands as output. We show that discrete action spaces are outperforming continuous control commands in terms of expected average reward in maze-like environments. Furthermore, we show how to accelerate the learning and increase the robustness of the policy by incorporating predicted depth maps by a generative adversarial network.

MCML Authors

Laura Leal-Taixé

Prof. Dr.

* Former Principal Investigator

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[181]

P. Müller, V. Golkov, V. Tomassini and D. Cremers.
Rotation-Equivariant Deep Learning for Diffusion MRI (short version).
ISMRM 2021 - International Society for Magnetic Resonance in Medicine Annual Meeting. Virtual, May 15-20, 2021. Long version in arXiv. arXiv

Abstract

Convolutional networks are successful, but they have recently been outperformed by new neural networks that are equivariant under rotations and translations. These new networks work better because they do not struggle with learning each possible orientation of each image feature separately. So far, they have been proposed for 2D and 3D data. Here we generalize them to 6D diffusion MRI data, ensuring joint equivariance under 3D roto-translations in image space and the matching 3D rotations in q-space, as dictated by the image formation. Such equivariant deep learning is appropriate for diffusion MRI, because microstructural and macrostructural features such as neural fibers can appear at many different orientations, and because even non-rotation-equivariant deep learning has so far been the best method for many diffusion MRI tasks. We validate our equivariant method on multiple-sclerosis lesion segmentation. Our proposed neural networks yield better results and require fewer scans for training compared to non-rotation-equivariant deep learning. They also inherit all the advantages of deep learning over classical diffusion MRI methods. Our implementation is available at this https URL and can be used off the shelf without understanding the mathematical background.

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[180]

J. Schuchardt, A. Bojchevski, J. Gasteiger and S. Günnemann.
Collective Robustness Certificates - Exploiting Interdependence in Graph Neural Networks.
ICLR 2021 - 9th International Conference on Learning Representations. Virtual, May 03-07, 2021. URL

Abstract

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[179]

M. Lotfollahi, A. K. Susmelj, C. De Donno, Y. Ji, I. L. Ibarra, F. A. Wolf, N. Yakubova, F. J. Theis and D. Lopez-Paz.
Compositional perturbation autoencoder for single-cell response modeling.
Preprint (May. 2021). DOI

Abstract

Recent advances in multiplexed single-cell transcriptomics experiments are facilitating the high-throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible, so computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep-learning approaches for single-cell response modeling. CPA encodes and learns transcriptional drug responses across different cell type, dose, and drug combinations. The model produces easy-to-interpret embeddings for drugs and cell types, which enables drug similarity analysis and predictions for unseen dosage and drug combinations. We show that CPA accurately models single-cell perturbations across compounds, doses, species, and time. We further demonstrate that CPA predicts combinatorial genetic interactions of several types, implying that it captures features that distinguish different interaction programs. Finally, we demonstrate that CPA can generate in-silico 5,329 missing genetic combination perturbations (97.6% of all possibilities) with diverse genetic interactions. We envision our model will facilitate efficient experimental design and hypothesis generation by enabling in-silico response prediction at the single-cell level, and thus accelerate therapeutic applications using single-cell technologies.

MCML Authors

Fabian Theis

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Mathematical Modelling of Biological Systems

[178]

E. Faerman.
Representation learning on relational data.
Dissertation 2021. DOI

Abstract

This thesis introduces methods that leverage relational information to address various problems in machine learning, such as node classification, graph matching, and argument mining. It explores unsupervised and semi-supervised approaches for node classification, graph alignment for geographical maps and knowledge graphs, and proposes a novel method for identifying and searching arguments in peer reviews. Additionally, it presents a subspace clustering method that uses relationships to improve clustering performance on large datasets. (Shortened.)

MCML Authors

Evgeny Faerman

Dr.

* Former Member

[177]

N. Kassner, P. Dufter and H. Schütze.
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models.
EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics. Virtual, Apr 19-23, 2021. DOI

Abstract

Recently, it has been found that monolingual English language models can be used as knowledge bases. Instead of structural knowledge base queries, masked sentences such as “Paris is the capital of [MASK]” are used as probes. We translate the established benchmarks TREx and GoogleRE into 53 languages. Working with mBERT, we investigate three questions. (i) Can mBERT be used as a multilingual knowledge base? Most prior work only considers English. Extending research to multiple languages is important for diversity and accessibility. (ii) Is mBERT’s performance as knowledge base language-independent or does it vary from language to language? (iii) A multilingual model is trained on more text, e.g., mBERT is trained on 104 Wikipedias. Can mBERT leverage this for better performance? We find that using mBERT as a knowledge base yields varying performance across languages and pooling predictions across languages improves performance. Conversely, mBERT exhibits a language bias; e.g., when queried in Italian, it tends to predict Italy as the country of origin.

MCML Authors

Nora Kassner

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[176]

Y. Ma and V. Tresp.
Causal Inference under Networked Interference and Intervention Policy Enhancement.
AISTATS 2021 - 24th International Conference on Artificial Intelligence and Statistics. Virtual, Apr 13-15, 2021. URL

Abstract

Estimating individual treatment effects from data of randomized experiments is a critical task in causal inference. The Stable Unit Treatment Value Assumption (SUTVA) is usually made in causal inference. However, interference can introduce bias when the assigned treatment on one unit affects the potential outcomes of the neighboring units. This interference phenomenon is known as spillover effect in economics or peer effect in social science. Usually, in randomized experiments or observational studies with interconnected units, one can only observe treatment responses under interference. Hence, the issue of how to estimate the superimposed causal effect and recover the individual treatment effect in the presence of interference becomes a challenging task in causal inference. In this work, we study causal effect estimation under general network interference using Graph Neural Networks, which are powerful tools for capturing node and link dependencies in graphs. After deriving causal effect estimators, we further study intervention policy improvement on the graph under capacity constraint. We give policy regret bounds under network interference and treatment capacity constraint. Furthermore, a heuristic graph structure-dependent error bound for Graph Neural Network-based causal estimators is provided.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[175]

I. Gerostathopoulos, F. Plášil, C. Prehofer, J. Thomas and B. Bischl.
Automated Online Experiment-Driven Adaptation--Mechanics and Cost Aspects.
IEEE Access 9 (Apr. 2021). DOI

Abstract

As modern software-intensive systems become larger, more complex, and more customizable, it is desirable to optimize their functionality by runtime adaptations. However, in most cases it is infeasible to fully model and predict their behavior in advance, which is a classical requirement of runtime self-adaptation. To address this problem, we propose their self-adaptation based on a sequence of online experiments carried out in a production environment. The key idea is to evaluate each experiment by data analysis and determine the next potential experiment via an optimization strategy. The feasibility of the approach is illustrated on a use case devoted to online self-adaptation of traffic navigation where Bayesian optimization, grid search, and local search are employed as the optimization strategies. Furthermore, the cost of the experiments is discussed and three key cost components are examined-time cost, adaptation cost, and endurability cost.

MCML Authors

Janek Thomas

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[174]

Abstract

Spatial omics data are advancing the study of tissue organization and cellular communication at an unprecedented scale. Here, we present Squidpy, a Python framework that brings together tools from omics and image analysis to enable scalable description of spatial molecular data, such as transcriptome or multivariate proteins. Squidpy provides both infrastructure and numerous analysis methods that allow to efficiently store, manipulate and interactively visualize spatial omics data.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[173]

M. Sabet, P. Dufter, F. Yvon and H. Schütze.
SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings.
Preprint (Apr. 2021). arXiv

Abstract

Word alignments are useful for tasks like statistical and neural machine translation (NMT) and cross-lingual annotation projection. Statistical word aligners perform well, as do methods that extract alignments jointly with translations in NMT. However, most approaches require parallel training data, and quality decreases as less training data is available. We propose word alignment methods that require no parallel data. The key idea is to leverage multilingual word embeddings, both static and contextualized, for word alignment. Our multilingual embeddings are created from monolingual data only without relying on any parallel data or dictionaries. We find that alignments created from embeddings are superior for four and comparable for two language pairs compared to those produced by traditional statistical aligners, even with abundant parallel data; e.g., contextualized embeddings achieve a word alignment F1 for English-German that is 5 percentage points higher than eflomal, a high-quality statistical aligner, trained on 100k parallel sentences.

MCML Authors

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[172]

M. Berrendorf, E. Faerman and V. Tresp.
Active Learning for Entity Alignment.
ECIR 2021 - 43rd European Conference on Information Retrieval. Virtual, Mar 28-Apr 01, 2021. DOI GitHub

Abstract

In this work, we propose a novel framework for labeling entity alignments in knowledge graph datasets. Different strategies to select informative instances for the human labeler build the core of our framework. We illustrate how the labeling of entity alignments is different from assigning class labels to single instances and how these differences affect the labeling efficiency. Based on these considerations, we propose and evaluate different active and passive learning strategies. One of our main findings is that passive learning approaches, which can be efficiently precomputed, and deployed more easily, achieve performance comparable to the active learning strategies.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[171]

M. Berrendorf, L. Wacker and E. Faerman.
A Critical Assessment of State-of-the-Art in Entity Alignment.
ECIR 2021 - 43rd European Conference on Information Retrieval. Virtual, Mar 28-Apr 01, 2021. DOI GitHub

Abstract

In this work, we perform an extensive investigation of two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs. Therefore, we first carefully examine the benchmarking process and identify several shortcomings, making the results reported in the original works not always comparable. Furthermore, we suspect that it is a common practice in the community to make the hyperparameter optimization directly on a test set, reducing the informative value of reported performance. Thus, we select a representative sample of benchmarking datasets and describe their properties. We also examine different initializations for entity representations since they are a decisive factor for model performance. Furthermore, we use a shared train/validation/test split for an appropriate evaluation setting to evaluate all methods on all datasets. In our evaluation, we make several interesting findings. While we observe that most of the time SotA approaches perform better than baselines, they have difficulties when the dataset contains noise, which is the case in most real-life applications. Moreover, in our ablation study, we find out that often different features of SotA method are crucial for good performance than previously assumed.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

[170]

M. Fromm, M. Berrendorf, S. Obermeier, T. Seidl and E. Faerman.
Diversity Aware Relevance Learning for Argument Search.
ECIR 2021 - 43rd European Conference on Information Retrieval. Virtual, Mar 28-Apr 01, 2021. DOI GitHub

Abstract

In this work, we focus on the problem of retrieving relevant arguments for a query claim covering diverse aspects. State-of-the-art methods rely on explicit mappings between claims and premises, and thus are unable to utilize large available collections of premises without laborious and costly manual annotation. Their diversity approach relies on removing duplicates via clustering which does not directly ensure that the selected premises cover all aspects. This work introduces a new multi-step approach for the argument retrieval problem. Rather than relying on ground-truth assignments, our approach employs a machine learning model to capture semantic relationships between arguments. Beyond that, it aims to cover diverse facets of the query, instead of trying to identify duplicates explicitly. Our empirical evaluation demonstrates that our approach leads to a significant improvement in the argument retrieval task even though it requires less data.

MCML Authors

Michael Fromm

Dr.

* Former Member

Max Berrendorf

Dr.

* Former Member

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

Evgeny Faerman

Dr.

* Former Member

[169]

A. Beer, E. Allerborn, V. Hartmann and T. Seidl.
KISS - A fast kNN-based Importance Score for Subspaces.
EDBT 2021 - 24th International Conference on Extending Database Technology. Nicosia, Cyprus, Mar 23-26, 2021. PDF

Abstract

In high-dimensional datasets some dimensions or attributes can be more important than others. Whereas most algorithms neglect one or more dimensions for all points of a dataset or at least for all points of a certain cluster together, our method KISS (textbf{k}NN-based textbf{I}mportance textbf{S}core of textbf{S}ubspaces) detects the most important dimensions for each point individually. It is fully unsupervised and does not depend on distorted multidimensional distance measures. Instead, the $k$ nearest neighbors ($k$NN) in one-dimensional projections of the data points are used to calculate the score for every dimension’s importance. Experiments across a variety of settings show that those scores reflect well the structure of the data. KISS can be used for subspace clustering. What sets it apart from other methods for this task is its runtime, which is linear in the number of dimensions and $O(n log(n))$ in the number of points, as opposed to quadratic or even exponential runtimes for previous algorithms.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[168]

P. Kopper, S. Pölsterl, C. Wachinger, B. Bischl, A. Bender and D. Rügamer.
Semi-Structured Deep Piecewise Exponential Models.
AAAI-SPACA 2021 - AAAI Spring Symposium Series on Survival Prediction: Algorithms, Challenges and Applications. Palo Alto, California, USA, Mar 21-24, 2021. PDF

Abstract

We propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning. The presented framework is based on piecewise expo-nential models and thereby supports various survival tasks, such as competing risks and multi-state modeling, and further allows for estimation of time-varying effects and time-varying features. To also include multiple data sources and higher-order interaction effects into the model, we embed the model class in a neural network and thereby enable the si-multaneous estimation of both inherently interpretable structured regression inputs as well as deep neural network components which can potentially process additional unstructured data sources. A proof of concept is provided by using the framework to predict Alzheimer’s disease progression based on tabular and 3D point cloud data and applying it to synthetic data.

MCML Authors

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[167]

M. Ali, M. Berrendorf, C. T. Hoyt, L. Vermue, S. Sharifzadeh, V. Tresp and J. Lehmann.
PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings.
Journal of Machine Learning Research 22.82 (Mar. 2021). PDF

Abstract

Recently, knowledge graph embeddings (KGEs) have received significant attention, and several software libraries have been developed for training and evaluation. While each of them addresses specific needs, we report on a community effort to a re-design and re-implementation of PyKEEN, one of the early KGE libraries. PyKEEN 1.0 enables users to compose knowledge graph embedding models based on a wide range of interaction models, training approaches, loss functions, and permits the explicit modeling of inverse relations. It allows users to measure each component’s influence individually on the model’s performance. Besides, an automatic memory optimization has been realized in order to optimally exploit the provided hardware. Through the integration of Optuna, extensive hyper-parameter optimization (HPO) functionalities are provided.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[166]

M. Fromm, E. Faerman, M. Berrendorf, S. Bhargava, R. Qi, Y. Zhang, L. Dennert, S. Selle, Y. Mao and T. Seidl.
Argument Mining Driven Analysis of Peer-Reviews.
AAAI 2021 - 35th Conference on Artificial Intelligence. Virtual, Feb 02-09, 2021. DOI GitHub

Abstract

Peer reviewing is a central process in modern research and essential for ensuring high quality and reliability of published work. At the same time, it is a time-consuming process and increasing interest in emerging fields often results in a high review workload, especially for senior researchers in this area. How to cope with this problem is an open question and it is vividly discussed across all major conferences. In this work, we propose an Argument Mining based approach for the assistance of editors, meta-reviewers, and reviewers. We demonstrate that the decision process in the field of scientific publications is driven by arguments and automatic argument identification is helpful in various use-cases. One of our findings is that arguments used in the peer-review process differ from arguments in other domains making the transfer of pre-trained models difficult. Therefore, we provide the community with a new peer-review dataset from different computer science conferences with annotated arguments. In our extensive empirical evaluation, we show that Argument Mining can be used to efficiently extract the most relevant parts from reviews, which are paramount for the publication decision. The process remains interpretable since the extracted arguments can be highlighted in a review without detaching them from their context.

MCML Authors

Michael Fromm

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Max Berrendorf

Dr.

* Former Member

Yao Zhang

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[165]

S. Sharifzadeh, S. M. Baharlou and V. Tresp.
Classification by Attention: Scene Graph Classification with Prior Knowledge.
AAAI 2021 - 35th Conference on Artificial Intelligence. Virtual, Feb 02-09, 2021. DOI

Abstract

A major challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another. Previous works have addressed this by relational reasoning over all objects in an image or incorporating prior knowledge into classification. Unlike previous works, we do not consider separate models for perception and prior knowledge. Instead, we take a multi-task learning approach by introducing schema representations and implementing the classification as an attention layer between image-based representations and the schemata. This allows for the prior knowledge to emerge and propagate within the perception model. By enforcing the model also to represent the prior, we achieve a strong inductive bias. We show that our model can accurately generate commonsense knowledge and that the iterative injection of this knowledge to scene representations, as a top-down mechanism, leads to significantly higher classification performance. Additionally, our model can be fine-tuned on external knowledge given as triples. When combined with self-supervised learning and with 1% of annotated images only, this gives more than 3% improvement in object classification, 26% in scene graph classification, and 36% in predicate prediction accuracy.

MCML Authors

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[164]

S. Klau, S. Hoffmann, C. J. Patel, J. P. A. Ioannidis and A.-L. Boulesteix.
Examining the robustness of observational associations to model, measurement and sampling uncertainty with the vibration of effects framework.
International Journal of Epidemiology 50.1 (Feb. 2021). DOI

Abstract

Uncertainty is a crucial issue in statistics which can be considered from different points of view. One type of uncertainty, typically referred to as sampling uncertainty, arises through the variability of results obtained when the same analysis strategy is applied to different samples. Another type of uncertainty arises through the variability of results obtained when using the same sample but different analysis strategies addressing the same research question. We denote this latter type of uncertainty as method uncertainty. It results from all the choices to be made for an analysis, for example, decisions related to data preparation, method choice, or model selection. In medical sciences, a large part of omics research is focused on the identification of molecular biomarkers, which can either be performed through ranking or by selection from among a large number of candidates. In this paper, we introduce a general resampling-based framework to quantify and compare sampling and method uncertainty. For illustration, we apply this framework to different scenarios related to the selection and ranking of omics biomarkers in the context of acute myeloid leukemia: variable selection in multivariable regression using different types of omics markers, the ranking of biomarkers according to their predictive performance, and the identification of differentially expressed genes from RNA-seq data. For all three scenarios, our findings suggest highly unstable results when the same analysis strategy is applied to two independent samples, indicating high sampling uncertainty and a comparatively smaller, but non-negligible method uncertainty, which strongly depends on the methods being compared.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[163]

F. Meier, N. D. Köhler, A.-D. Brunner, J.-M. H. Wanka, E. Voytik, M. T. Strauss, F. J. Theis and M. Mann.
Deep learning the collisional cross sections of the peptide universe from a million experimental values.
Nature Communications 12.1185 (Feb. 2021). DOI

Abstract

The size and shape of peptide ions in the gas phase are an under-explored dimension for mass spectrometry-based proteomics. To investigate the nature and utility of the peptide collisional cross section (CCS) space, we measure more than a million data points from whole-proteome digests of five organisms with trapped ion mobility spectrometry (TIMS) and parallel accumulation-serial fragmentation (PASEF). The scale and precision (CV < 1%) of our data is sufficient to train a deep recurrent neural network that accurately predicts CCS values solely based on the peptide sequence. Cross section predictions for the synthetic ProteomeTools peptides validate the model within a 1.4% median relative error (R > 0.99). Hydrophobicity, proportion of prolines and position of histidines are main determinants of the cross sections in addition to sequence-specific interactions. CCS values can now be predicted for any peptide and organism, forming a basis for advanced proteomics workflows that make full use of the additional information.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Modelling of Biological Systems

[162]

J. Goschenhofer, R. Hvingelby, D. Rügamer, J. Thomas, M. Wagner and B. Bischl.
Deep Semi-Supervised Learning for Time Series Classification.
Preprint (Feb. 2021). arXiv

Abstract

While Semi-supervised learning has gained much attention in computer vision on image data, yet limited research exists on its applicability in the time series domain. In this work, we investigate the transferability of state-of-the-art deep semi-supervised models from image to time series classification. We discuss the necessary model adaptations, in particular an appropriate model backbone architecture and the use of tailored data augmentation strategies. Based on these adaptations, we explore the potential of deep semi-supervised learning in the context of time series classification by evaluating our methods on large public time series classification problems with varying amounts of labelled samples. We perform extensive comparisons under a decidedly realistic and appropriate evaluation scheme with a unified reimplementation of all algorithms considered, which is yet lacking in the field. We find that these transferred semi-supervised models show significant performance gains over strong supervised, semi-supervised and self-supervised alternatives, especially for scenarios with very few labelled samples.

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Janek Thomas

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[161]

G. König, C. Molnar, B. Bischl and M. Grosse-Wentrup.
Relative Feature Importance.
ICPR 2020 - 25th International Conference on Pattern Recognition. Virtual - Milano, Italy, Jan 10-15, 2021. DOI

Abstract

Interpretable Machine Learning (IML) methods are used to gain insight into the relevance of a feature of interest for the performance of a model. Commonly used IML methods differ in whether they consider features of interest in isolation, e.g., Permutation Feature Importance (PFI), or in relation to all remaining feature variables, e.g., Conditional Feature Importance (CFI). As such, the perturbation mechanisms inherent to PFI and CFI represent extreme reference points. We introduce Relative Feature Importance (RFI), a generalization of PFI and CFI that allows for a more nuanced feature importance computation beyond the PFI versus CFI dichotomy. With RFI, the importance of a feature relative to any other subset of features can be assessed, including variables that were not available at training time. We derive general interpretation rules for RFI based on a detailed theoretical analysis of the implications of relative feature relevance, and demonstrate the method’s usefulness on simulated examples.

MCML Authors

Gunnar König

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

[160]

S. Schmoll and M. Schubert.
Semi-Markov Reinforcement Learning for Stochastic Resource Collection.
IJCAI 2020 - 29th International Joint Conference on Artificial Intelligence. Yokohama, Japan (postponed due to the Corona pandemic), Jan 07-15, 2021. DOI

Abstract

We show that the task of collecting stochastic, spatially distributed resources (Stochastic Resource Collection, SRC) may be considered as a Semi-Markov-Decision-Process. Our Deep-Q-Network (DQN) based approach uses a novel scalable and transferable artificial neural network architecture. The concrete use-case of the SRC is an officer (single agent) trying to maximize the amount of fined parking violations in his area. We evaluate our approach on a environment based on the real-world parking data of the city of Melbourne. In small, hence simple, settings with short distances between resources and few simultaneous violations, our approach is comparable to previous work. When the size of the network grows (and hence the amount of resources) our solution significantly outperforms preceding methods. Moreover, applying a trained agent to a non-overlapping new area outperforms existing approaches.

MCML Authors

Matthias Schubert

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Spatial Artificial Intelligence

[159]

M. Becker, S. Gruber, J. Richter, J. Moosbauer and B. Bischl.
mlr3hyperband: Hyperband for 'mlr3'.
2021. URL GitHub

Abstract

mlr3hyperband adds the optimization algorithms Successive Halving (Jamieson and Talwalkar 2016) and Hyperband (Li et al. 2018) to the mlr3 ecosystem. The implementation in mlr3hyperband features improved scheduling and parallelizes the evaluation of configurations. The package includes tuners for hyperparameter optimization in mlr3tuning and optimizers for black-box optimization in bbotk.

MCML Authors

Marc Becker

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[158]

M. Becker, M. Lang, J. Richter, B. Bischl and D. Schalk.
mlr3tuning: Tuning for 'mlr3'.
2021. URL GitHub

Abstract

mlr3tuning is the hyperparameter optimization package of the mlr3 ecosystem. It features highly configurable search spaces via the paradox package and finds optimal hyperparameter configurations for any mlr3 learner. mlr3tuning works with several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in mlr3mbo) and Hyperband (in mlr3hyperband). Moreover, it can automatically optimize learners and estimate the performance of optimized models with nested resampling. The package is built on the optimization framework bbotk.

MCML Authors

Marc Becker

Statistical Learning and Data Science

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Daniel Schalk

Dr.

* Former Member

[157]

M. Becker, J. Richter, M. Lang, B. Bischl and M. Binder.
bbotk: Black-Box Optimization Toolkit.
2021. URL GitHub

Abstract

bbotk is a black-box optimization framework for R. It features highly configurable search spaces via the paradox package and optimizes every user-defined objective function. The package includes several optimization algorithms e.g. Random Search, Grid Search, Iterated Racing, Bayesian Optimization (in mlr3mbo) and Hyperband (in mlr3hyperband). bbotk is the base package of mlr3tuning, mlr3fselect and miesmuschel.

MCML Authors

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

Statistical Learning and Data Science

[156]

M. Binder.
mlrintermbo: Model-Based Optimization for 'mlr3' through 'mlrMBO'.
2021. URL GitHub

Abstract

The ‘mlrMBO’ package can ordinarily not be used for optimization within ‘mlr3’, because of incompatibilities of their respective class systems. ‘mlrintermbo’ offers a compatibility interface that provides ‘mlrMBO’ as an ‘mlr3tuning’ ‘Tuner’ object, for tuning of machine learning algorithms within ‘mlr3’, as well as a ‘bbotk’ ‘Optimizer’ object for optimization of general objective functions using the ‘bbotk’ black box optimization framework. The control parameters of ‘mlrMBO’ are faithfully reproduced as a ‘paradox’ ‘ParamSet’.

MCML Authors

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[155]

M. Lang.
mlr3measures: Performance Measures for 'mlr3'.
2021. URL

Abstract

Implements multiple performance measures for supervised learning. Includes over 40 measures for regression and classification. Additionally, meta information about the performance measures can be queried, e.g. what the best and worst possible performances scores are.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[154]

M. Lang, B. Bischl, J. Richter, X. Sun and M. Binder.
paradox: Define and Work with Parameter Spaces for Complex Algorithms.
2021. URL GitHub

Abstract

The paradox package offers a language for the description of parameter spaces, as well as tools for useful operations on these parameter spaces.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

Statistical Learning and Data Science

[153]

D. Rügamer, F. Pfisterer and P. Baumann.
deepregression: Fitting Semi-Structured Deep Distributional Regression in R.
2021. URL

Abstract

Allows for the specification of semi-structured deep distributional regression models which are fitted in a neural network as proposed by Ruegamer et al. (2023). Predictors can be modeled using structured (penalized) linear effects, structured non-linear effects or using an unstructured deep network model.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Florian Pfisterer

Dr.

* Former Member

[152]

P. Schratz and M. Becker.
mlr3spatiotempcv: Spatiotemporal Resampling Methods for 'mlr3'.
2021. URL

Abstract

Extends the mlr3 ML framework with spatio-temporal resampling methods to account for the presence of spatiotemporal autocorrelation (STAC) in predictor variables. STAC may cause highly biased performance estimates in cross-validation if ignored.

MCML Authors

Patrick Schratz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[151]

H. Seibold, S. Czerny, S. Decke, R. Dieterle, T. Eder, S. Fohr, N. Hahn, R. Hartmann, C. Heindl, P. Kopper, D. Lepke, V. Loidl, M. M. Mandl, S. Musiol, J. Peter, A. Piehler, E. Rojas, S. Schmid, H. Schmidt, M. Schmoll, L. Schneider, X.-Y. To, V. Tran, A. Völker, M. Wagner, J. Wagner, M. Waize, H. Wecker, R. Yang, S. Zellner and M. Nalenz.
A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses.
PLOS One 16.6 (2021). DOI

Abstract

Computational reproducibility is a corner stone for sound and credible research. Especially in complex statistical analyses—such as the analysis of longitudinal data—reproducing results is far from simple, especially if no source code is available. In this work we aimed to reproduce analyses of longitudinal data of 11 articles published in PLOS ONE. Inclusion criteria were the availability of data and author consent. We investigated the types of methods and software used and whether we were able to reproduce the data analysis using open source software. Most articles provided overview tables and simple visualisations. Generalised Estimating Equations (GEEs) were the most popular statistical models among the selected articles. Only one article used open source software and only one published part of the analysis code. Replication was difficult in most cases and required reverse engineering of results or contacting the authors. For three articles we were not able to reproduce the results, for another two only parts of them. For all but two articles we had to contact the authors to be able to reproduce the results. Our main learning is that reproducing papers is difficult if no code is supplied and leads to a high burden for those conducting the reproductions. Open data policies in journals are good, but to truly boost reproducibility we suggest adding open code policies.

MCML Authors

Maximilian Mandl

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Xiao-Yin To

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Viet Tran

C2 | Biology
→ Group Christian Müller

Biomedical Statistics and Data Science

[150]

J. P. Lopez, E. Brivio, A. Santambrogio, C. De Donno, A. Kos, M. Peters, N. Rost, D. Czamara, T. M. Brückl, S. Roeh, M. L. Pöhlmann, C. Engelhardt, A. Ressle, R. Stoffel, A. Tontsch, J. M. Villamizar, M. Reincke, A. Riester, S. Sbiera, M. Fassnacht, H. S. Mayberg, W. E. Craighead, B. W. Dunlop, C. B. Nemeroff, M. V. Schmidt, E. B. Binder, F. J. Theis, F. Beuschlein, C. L. Andoniadou and A. Chen.
Single-cell molecular profiling of all three components of the HPA axis reveals adrenal ABCB1 as a regulator of stress adaptation.
Science Advances 7.5 (Jan. 2021). DOI

Abstract

Chronic activation and dysregulation of the neuroendocrine stress response have severe physiological and psychological consequences, including the development of metabolic and stress-related psychiatric disorders. We provide the first unbiased, cell type–specific, molecular characterization of all three components of the hypothalamic-pituitary-adrenal axis, under baseline and chronic stress conditions. Among others, we identified a previously unreported subpopulation of Abcb1b+ cells involved in stress adaptation in the adrenal gland. We validated our findings in a mouse stress model, adrenal tissues from patients with Cushing’s syndrome, adrenocortical cell lines, and peripheral cortisol and genotyping data from depressed patients. This extensive dataset provides a valuable resource for researchers and clinicians interested in the organism’s nervous and endocrine responses to stress and the interplay between these tissues. Our findings raise the possibility that modulating ABCB1 function may be important in the development of treatment strategies for patients suffering from metabolic and stress-related psychiatric disorders.

MCML Authors

Fabian Theis

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Mathematical Modelling of Biological Systems

[149]

M. Weigert, A. Bauer, J. Karl, A. Nalmpatian, H. Küchenhoff and J. Schmude.
Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances.
Tourism Economics 28.5 (Jan. 2021). DOI

Abstract

This study investigates how age, period, and birth cohorts are related to altering travel distances. We analyze a repeated cross-sectional survey of German pleasure travels for the period 1971–2018 using a holistic age–period–cohort (APC) analysis framework. Changes in travel distances are attributed to the life cycle (age effect), macro-level developments (period effect), and generational membership (cohort effect). We introduce ridgeline matrices and partial APC plots as innovative visualization techniques facilitating the intuitive interpretation of complex temporal structures. Generalized additive models are used to circumvent the identification problem by fitting a bivariate tensor product spline between age and period. The results indicate that participation in short-haul trips is mainly associated with age, while participation in long-distance travel predominantly changed over the period. Generational membership shows less association with destination choice concerning travel distance. The presented APC approach is promising to address further questions of interest in tourism research.

MCML Authors

Maximilian Weigert

* Former Member

Alexander Bauer

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

2020

[148]

M. Berrendorf, E. Faerman, L. Vermue and V. Tresp.
Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank.
WI-IAT 2020 - IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Virtual, Dec 14-17, 2020. DOI

Abstract

In this work, we take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment. In the current experimental setting, multiple different scores are employed to assess different aspects of model performance. We analyze the informativeness of these evaluation measures and identify several shortcomings. In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets. Moreover, we demonstrate that varying size of the test size automatically has impact on the performance of the same model based on commonly used metrics for the Entity Alignment task. We show that this leads to various problems in the interpretation of results, which may support misleading conclusions. Therefore, we propose adjustments to the evaluation and demonstrate empirically how this supports a fair, comparable, and interpretable assessment of model performance.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

[147]

E. Faerman, F. Borutta, J. Busch and M. Schubert.
Ada-LLD: Adaptive Node Similarity Using Multi-Scale Local Label Distributions.
WI-IAT 2020 - IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Virtual, Dec 14-17, 2020. DOI GitHub

Abstract

In many applications, data is represented as a network connecting nodes of various types. While types might be known for some nodes in the network, the type of a newly added node is typically unknown. In this paper, we focus on predicting the types of these new nodes based on their connectivity to the already labeled nodes. To tackle this problem, we propose Adaptive Node Similarity Using Multi-Scale Local Label Distributions (Ada-LLD) which learns the dependency of a node’s class label from the distribution of class labels in this node’s local neighborhood. In contrast to previous approaches, our approach is able to learn how class labels correlate with labels in variously sized neighborhoods. We propose a neural network architecture that combines information from differently sized neighborhoods allowing for the detection of correlations on multiple scales. Our evaluations demonstrate that our method significantly improves prediction quality on real world data sets. In the spirit of reproducible research we make our code available.

MCML Authors

Evgeny Faerman

Dr.

* Former Member

Felix Borutta

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[146]

S. Obermeier, M. Berrendorf and P. Kröger.
Memory-Efficient RkNN Retrieval by Nonlinear k-Distance Approximation.
WI-IAT 2020 - IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Virtual, Dec 14-17, 2020. DOI

Abstract

The reverse k-nearest neighbor (RkNN) query is an established query type with various applications reaching from identifying highly influential objects over incrementally updating kNN graphs to optimizing sensor communication and outlier detection. State-of-the-art solutions exploit that the k-distances in real-world datasets often follow the power-law distribution, and bound them with linear lines in log-log space. In this work, we investigate this assumption and uncover that it is violated in regions of changing density, which we show are typical for real-life datasets. Towards a generic solution, we pose the estimation of k-distances as a regression problem. Thereby, we enable harnessing the power of the abundance of available Machine Learning models and profiting from their advancement. We propose a flexible approach which allows steering the performance-memory consumption trade-off, and in particular to find good solutions with a fixed memory budget crucial in the context of edge computing. Moreover, we show how to obtain and improve guaranteed bounds essential to exact query processing. In experiments on real-world datasets, we demonstrate how this framework can significantly reduce the index memory consumption, and strongly reduce the candidate set size. We publish our code at https://github.com/sobermeier/nonlinear-kdist, and a detailed technical report at https://arxiv.org/abs/2011.01773.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Max Berrendorf

Dr.

* Former Member

Peer Kröger

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

* Former Principal Investigator

[145]

C. Böhm and C. Plant.
Massively Parallel Graph Drawing and Representation Learning.
IEEE BigData 2020 - IEEE International Conference on Big Data. Virtual, Dec 10-13, 2020. DOI

Abstract

To fully exploit the performance potential of modern multi-core processors, machine learning and data mining algorithms for big data must be parallelized in multiple ways. Today’s CPUs consist of multiple cores, each following an independent thread of control, and each equipped with multiple arithmetic units which can perform the same operation on a vector of multiple data objects. Graph embedding, i.e. converting the vertices of a graph into numerical vectors is a data mining task of high importance and is useful for graph drawing (low-dimensional vectors) and graph representation learning (high-dimensional vectors). In this paper, we propose MulticoreGEMPE (Graph Embedding by Minimizing the Predictive Entropy), an information-theoretic method which can generate low and high-dimensional vectors. MulticoreGEMPE applies MIMD (Multiple Instructions Multiple Data, using OpenMP) and SIMD (Single Instructions Multiple Data, using AVX-512) parallelism. We propose general ideas applicable in other graph-based algorithms like emph{vectorized hashing} and emph{vectorized reduction}. Our experimental evaluation demonstrates the superiority of our approach.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[144]

C. Böhm and C. Plant.
Massively Parallel Random Number Generation.
IEEE BigData 2020 - IEEE International Conference on Big Data. Virtual, Dec 10-13, 2020. DOI

Abstract

Random numbers are of high importance for many applications, e.g. simulation, optimization, and data mining. Unlike in information security, in these applications the demands on the quality of the random numbers are only moderate while the most important issue is the runtime efficiency. We propose in this paper new SIMD (Single Instruction, Multiple Data) and MIMD (Multiple Instructions, Multiple Data) parallel methods for Linear Congruential Generators (LCG), the most widespread class of fast pseudo-random number generators. In particular, we propose algorithms for the well-known 48-bit LCG used in the Java-class Random and in the method drand48() of C++ for processors using AVX (Advanced Vector eXtensions) and OpenMP. Our focus is on consistency with the original methods which facilitates debugging and enables the user to exactly reproduce previous non-parallel experiments in a SIMD and MIMD environment. Our experimental evaluation demonstrates the superiority of our algorithms.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[143]

M. Perdacher, C. Plant and C. Böhm.
Improved Data Locality Using Morton-order Curve on the Example of LU Decomposition.
IEEE BigData 2020 - IEEE International Conference on Big Data. Virtual, Dec 10-13, 2020. DOI

Abstract

The LU decomposition is an essential element used in many linear algebra applications. Furthermore, it is used in LINPACK to benchmark the performance of modern multi-core processor environments. These processors offer a large memory hierarchy including multiple registers and various levels of cache. Registers or L1 data cache are small in size but also very fast. The L2 or L3 cache memory is usually shared among other cores and larger but slower. For the LU decomposition, the latency of fetching data from the main memory to the registers to perform a calculation also depends on the input matrix’s memory access pattern. Here, we look at the block factorization algorithm, where the LU decomposition performance depends on the performance of the matrix multiplication. In both cases, the LU decomposition and the matrix multiplication, such a matrix is traversed by three nested loops. In this paper, we propose to traverse such loops in an order defined by a space-filling curve. This traversal dramatically improves data locality and offers effective exploitation of the memory hierarchy. Besides the canonical (or line-by-line) access pattern, we demonstrate the traversal in Hilbert-, Peano and Morton order. Our extensive experiments show that the Morton order (or Z -order) and the inverse Morton order (or Z-order) have a better runtime performance compared to the others.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[142]

S. Geisler, D. Zügner and S. Günnemann.
Reliable Graph Neural Networks via Robust Aggregation.
NeurIPS 2020 - 34th Conference on Neural Information Processing Systems. Virtual, Dec 06-12, 2020. URL

Abstract

Perturbations targeting the graph structure have proven to be extremely effective in reducing the performance of Graph Neural Networks (GNNs), and traditional defenses such as adversarial training do not seem to be able to improve robustness. This work is motivated by the observation that adversarially injected edges effectively can be viewed as additional samples to a node’s neighborhood aggregation function, which results in distorted aggregations accumulating over the layers. Conventional GNN aggregation functions, such as a sum or mean, can be distorted arbitrarily by a single outlier. We propose a robust aggregation function motivated by the field of robust statistics. Our approach exhibits the largest possible breakdown point of 0.5, which means that the bias of the aggregation is bounded as long as the fraction of adversarial edges of a node is less than 50%. Our novel aggregation function, Soft Medoid, is a fully differentiable generalization of the Medoid and therefore lends itself well for end-to-end deep learning. Equipping a GNN with our aggregation improves the robustness with respect to structure perturbations on Cora ML by a factor of 3 (and 5.5 on Citeseer) and by a factor of 8 for low-degree nodes.

MCML Authors

Daniel Zügner

Dr.

* Former Member

Stephan Günnemann

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Data Analytics & Machine Learning

[141]

O. Shchur, N. Gao, M. Biloš and S. Günnemann.
Fast and Flexible Temporal Point Processes with Triangular Maps.
NeurIPS 2020 - 34th Conference on Neural Information Processing Systems. Virtual, Dec 06-12, 2020. URL

Abstract

Temporal point process (TPP) models combined with recurrent neural networks provide a powerful framework for modeling continuous-time event data. While such models are flexible, they are inherently sequential and therefore cannot benefit from the parallelism of modern hardware. By exploiting the recent developments in the field of normalizing flows, we design TriTPP - a new class of non-recurrent TPP models, where both sampling and likelihood computation can be done in parallel. TriTPP matches the flexibility of RNN-based methods but permits several orders of magnitude faster sampling. This enables us to use the new model for variational inference in continuous-time discrete-state systems. We demonstrate the advantages of the proposed framework on synthetic and real-world datasets.

MCML Authors

Oleksandr Shchur

Dr.

* Former Member

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[140]

Y. Ma and V. Tresp.
A Variational Quantum Circuit Model for Knowledge Graph Embeddings.
QTNML @NeurIPS 2020 - 1st Workshop on Quantum Tensor Networks in Machine Learning at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual, Dec 06-12, 2020. PDF

Abstract

Can quantum computing resources facilitate representation learning? In this work, we propose the first quantum Ansatz for statistical relational learning on knowledge graphs using parametric quantum circuits. We propose a variational quantum circuit for modeling knowledge graphs by introducing quantum representations of entities. In particular, latent representations of entities are encoded as coefficients of quantum states, while predicates are characterized by parametric gates acting on the quantum states. We show that quantum representations can be trained efficiently meanwhile preserving the quantum advantages. Simulations on classical machines with different datasets show that our proposed quantum circuit Ansatz and quantum representations can achieve comparable results to the state-of-the-art classical models, e.g., RESCAL, DISTMULT. Furthermore, after optimizing the models, the complexity of inductive inference on the knowledge graphs can be reduced with respect to the number of entities.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining

[139]

J. Busch, E. Faerman, M. Schubert and T. Seidl.
Learning Self-Expression Metrics for Scalable and Inductive Subspace Clustering.
SSL @NeurIPS 2020 - Workshop on Self-Supervised Learning - Theory and Practice at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual, Dec 06-12, 2020. arXiv GitHub

Abstract

Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data. In particular, methods relying on the self-expressiveness property have recently proved especially successful. However, they suffer from two major shortcomings: First, a quadratic-size coefficient matrix is learned directly, preventing these methods from scaling beyond small datasets. Secondly, the trained models are transductive and thus cannot be used to cluster out-of-sample data unseen during training. Instead of learning self-expression coefficients directly, we propose a novel metric learning approach to learn instead a subspace affinity function using a siamese neural network architecture. Consequently, our model benefits from a constant number of parameters and a constant-size memory footprint, allowing it to scale to considerably larger datasets. In addition, we can formally show that out model is still able to exactly recover subspace clusters given an independence assumption. The siamese architecture in combination with a novel geometric classifier further makes our model inductive, allowing it to cluster out-of-sample data. Additionally, non-linear clusters can be detected by simply adding an auto-encoder module to the architecture. The whole model can then be trained end-to-end in a self-supervised manner. This work in progress reports promising preliminary results on the MNIST dataset. In the spirit of reproducible research, me make all code publicly available. In future work we plan to investigate several extensions of our model and to expand experimental evaluation.

MCML Authors

Evgeny Faerman

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[138]

M. Berrendorf and E. Faerman.
mberr/ea-active-learning: Zenodo. Version 1.0.1.
2020. DOI

Abstract

Code for paper ‘Active Learning for Entity Alignment’ (https://arxiv.org/abs/2001.08943)

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

[137]

M. Berrendorf, L. Wacker and E. Faerman.
mberr/ea-sota-comparison: Zenodo. Version v1.1.1.
2020. DOI

Abstract

Code for paper ‘A Critical Assessment of State-of-the-Art in Entity Alignment.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

[136]

M. Lotfollahi, M. Naghipourfar, F. J. Theis and F. A. Wolf.
Conditional out-of-distribution generation for unpaired data using transfer VAE.
Bioinformatics 36.Supplement 2 (Dec. 2020). DOI

Abstract

Motivation: While generative models have shown great success in sampling high-dimensional samples conditional on low-dimensional descriptors (stroke thickness in MNIST, hair color in CelebA, speaker identity in WaveNet), their generation out-of-distribution poses fundamental problems due to the difficulty of learning compact joint distribution across conditions. The canonical example of the conditional variational autoencoder (CVAE), for instance, does not explicitly relate conditions during training and, hence, has no explicit incentive of learning such a compact representation.

Results: We overcome the limitation of the CVAE by matching distributions across conditions using maximum mean discrepancy in the decoder layer that follows the bottleneck. This introduces a strong regularization both for reconstructing samples within the same condition and for transforming samples across conditions, resulting in much improved generalization. As this amount to solving a style-transfer problem, we refer to the model as transfer VAE (trVAE). Benchmarking trVAE on high-dimensional image and single-cell RNA-seq, we demonstrate higher robustness and higher accuracy than existing approaches. We also show qualitatively improved predictions by tackling previously problematic minority classes and multiple conditions in the context of cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data. For generic tasks, we improve Pearson correlations of high-dimensional estimated means and variances with their ground truths from 0.89 to 0.97 and 0.75 to 0.87, respectively. We further demonstrate that trVAE learns cell-type-specific responses after perturbation and improves the prediction of most cell-type-specific genes by 65%.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[135]

N.-K. Chlis, A. Karlas, N.-A. Fasoula, M. Kallmayer, H.-H. Eckstein, F. J. Theis, V. Ntziachristos and C. Marr.
A sparse deep learning approach for automatic segmentation of human vasculature in multispectral optoacoustic tomography.
Photoacoustics 20.100203 (Dec. 2020). DOI

Abstract

Multispectral Optoacoustic Tomography (MSOT) resolves oxy- (HbO2) and deoxy-hemoglobin (Hb) to perform vascular imaging. MSOT suffers from gradual signal attenuation with depth due to light-tissue interactions: an effect that hinders the precise manual segmentation of vessels. Furthermore, vascular assessment requires functional tests, which last several minutes and result in recording thousands of images. Here, we introduce a deep learning approach with a sparse-UNET (S-UNET) for automatic vascular segmentation in MSOT images to avoid the rigorous and time-consuming manual segmentation. We evaluated the S-UNET on a test-set of 33 images, achieving a median DICE score of 0.88. Apart from high segmentation performance, our method based its decision on two wavelengths with physical meaning for the task-at-hand: 850 nm (peak absorption of oxy-hemoglobin) and 810 nm (isosbestic point of oxy-and deoxy-hemoglobin). Thus, our approach achieves precise data-driven vascular segmentation for automated vascular assessment and may boost MSOT further towards its clinical translation.

MCML Authors

Fabian Theis

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Mathematical Modelling of Biological Systems

[134]

E. Asgari, M. J. Sabet, P. Dufter, C. Ringlstetter and H. Schütze.
Subword Sampling for Low Resource Word Alignment.
Preprint (Dec. 2020). arXiv

Abstract

Annotation projection is an important area in NLP that can greatly contribute to creating language resources for low-resource languages. Word alignment plays a key role in this setting. However, most of the existing word alignment methods are designed for a high resource setting in machine translation where millions of parallel sentences are available. This amount reduces to a few thousands of sentences when dealing with low-resource languages failing the existing established IBM models. In this paper, we propose subword sampling-based alignment of text units. This method’s hypothesis is that the aggregation of different granularities of text for certain language pairs can help word-level alignment. For certain languages for which gold-standard alignments exist, we propose an iterative Bayesian optimization framework to optimize selecting possible subwords from the space of possible subword representations of the source and target sentences. We show that the subword sampling method consistently outperforms word-level alignment on six language pairs: English-German, English-French, English-Romanian, English-Persian, English-Hindi, and English-Inuktitut. In addition, we show that the hyperparameters learned for certain language pairs can be applied to other languages at no supervision and consistently improve the alignment results. We observe that using 5K parallel sentences together with our proposed subword sampling approach, we obtain similar F1 scores to the use of 100K’s of parallel sentences in existing word-level fast-align/eflomal alignment methods.

MCML Authors

Masoud Jalili Sabet

Dr.

* Former Member

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Computational Linguistics

[133]

M. Herrmann and F. Scheipl.
Unsupervised Functional Data Analysis via Nonlinear Dimension Reduction.
Preprint (Dec. 2020). arXiv

Abstract

In recent years, manifold methods have moved into focus as tools for dimension reduction. Assuming that the high-dimensional data actually lie on or close to a low-dimensional nonlinear manifold, these methods have shown convincing results in several settings. This manifold assumption is often reasonable for functional data, i.e., data representing continuously observed functions, as well. However, the performance of manifold methods recently proposed for tabular or image data has not been systematically assessed in the case of functional data yet. Moreover, it is unclear how to evaluate the quality of learned embeddings that do not yield invertible mappings, since the reconstruction error cannot be used as a performance measure for such representations. In this work, we describe and investigate the specific challenges for nonlinear dimension reduction posed by the functional data setting. The contributions of the paper are three-fold: First of all, we define a theoretical framework which allows to systematically assess specific challenges that arise in the functional data context, transfer several nonlinear dimension reduction methods for tabular and image data to functional data, and show that manifold methods can be used successfully in this setting. Secondly, we subject performance assessment and tuning strategies to a thorough and systematic evaluation based on several different functional data settings and point out some previously undescribed weaknesses and pitfalls which can jeopardize reliable judgment of embedding quality. Thirdly, we propose a nuanced approach to make trustworthy decisions for or against competing nonconforming embeddings more objectively.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Fabian Scheipl

PD Dr.

Functional Data Analysis

[132]

Y. Zhang, Y. Lu and T. Seidl.
KNNAC: An Efficient k Nearest Neighbor Based Clustering with Active Core Detection.
iiWAS 2020 - 22nd International Conference on Information Integration and Web-based Applications and Services. Chiang Mai, Thailand, Nov 30-Dec 02, 2020. DOI

Abstract

Density-based clustering algorithms are commonly adopted when arbitrarily shaped clusters exist. Usually, they do not need to know the number of clusters in prior, which is a big advantage. Conventional density-based approaches such as DBSCAN, utilize two parameters to define density. Recently, novel density-based clustering algorithms are proposed to reduce the problem complexity to the use of a single parameter k by utilizing the concepts of k Nearest Neighbor (kNN) and Reverse k Nearest Neighbor (RkNN) to define density. However, those kNN-based approaches are either ineffective or inefficient. In this paper, we present a new clustering algorithm KNNAC, which only requires computing the densities for a chosen subset of points due to the use of active core detection. We empirically show that, compared to other nearest neighbor based clustering approaches (e.g., RECORD, IS-DBSCAN, etc.), KNNAC can provide competitive performance while taking a fraction of the runtime.

MCML Authors

Yao Zhang

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[131]

V. Golkov, M. J. Skwark, A. Mirchev, G. Dikov, A. R. Geanes, J. Mendenhall, J. Meiler and D. Cremers.
3D Deep Learning for Biological Function Prediction from Physical Fields.
3DV 2020 - 8th International Conference on 3D Vision. Virtual, Nov 25-28, 2020. DOI

Abstract

Predicting the biological function of molecules, be it proteins or drug-like compounds, from their atomic structure is an important and long-standing problem. The electron density field and electrostatic potential field of a molecule contain the “raw fingerprint” of how this molecule can fit to binding partners. In this paper, we show that deep learning can predict biological function of molecules directly from their raw 3D approximated electron density and electrostatic potential fields. Protein function based on Enzyme Commission numbers is predicted from the approximated electron density field. In another experiment, the activity of small molecules is predicted with quality comparable to state-of-the-art descriptor-based methods. We propose several alternative computational models for the GPU with different memory and runtime requirements for different sizes of molecules and of databases. We also propose application-specific multi-channel data representations.

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computer Vision & Artificial Intelligence

[130]

N. Kassner, B. Krojer and H. Schütze.
Are Pretrained Language Models Symbolic Reasoners over Knowledge?
CoNLL 2020 - 24th Conference on Computational Natural Language Learning. Virtual, Nov 19-20, 2020. DOI

Abstract

How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that investigates the causal relation between facts present in training and facts learned by the PLM. For reasoning, we show that PLMs seem to learn to apply some symbolic reasoning rules correctly but struggle with others, including two-hop reasoning. Further analysis suggests that even the application of learned reasoning rules is flawed. For memorization, we identify schema conformity (facts systematically supported by other facts) and frequency as key factors for its success.

MCML Authors

Nora Kassner

Dr.

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[129]

D. Kazempour, A. Beer, P. Kröger and T. Seidl.
I fold you so! An internal evaluation measure for arbitrary oriented subspace clustering through piecewise-linear approximations of manifolds.
ICDMW 2020 - IEEE International Conference on Data Mining Workshops. Sorrento, Italy, Nov 17-20, 2020. DOI

Abstract

In this work we propose SRE, the first internal evaluation measure for arbitrary oriented subspace clustering results. For this purpose we present a new perspective on the subspace clustering task: the goal we formalize is to compute a clustering which represents the original dataset by minimizing the reconstruction loss from the obtained subspaces, while at the same time minimizing the dimensionality as well as the number of clusters. A fundamental feature of our approach is that it is model-agnostic, i.e., it is independent of the characteristics of any specific subspace clustering method. It is scale invariant and mathematically founded. The experiments show that the SRE scoring better assesses the quality of an arbitrarily oriented sub-space clustering compared to commonly used external evaluation measures.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[128]

D. Kazempour, P. Kröger and T. Seidl.
Towards an Internal Evaluation Measure for Arbitrarily Oriented Subspace Clustering.
ICDMW 2020 - IEEE International Conference on Data Mining Workshops. Sorrento, Italy, Nov 17-20, 2020. DOI

Abstract

In the setting of unsupervised machine learning, especially in clustering tasks, the evaluation of either novel algorithms or the assessment of a clustering of novel data is challenging. While mostly in the literature the evaluation of new methods is performed on labelled data, there are cases where no labels are at our disposal. In other cases we may not want to trust the “ground truth” labels. In general there exists a spectrum of so called internal evaluation measures in the literature. Each of the measures is mostly specialized towards a specific clustering model. The model of arbitrarily oriented subspace clusters is a more recent one. To the best of our knowledge there exist at the current time no internal evaluation measures tailored at assessing this particular type of clusterings. In this work we present the first internal quality measures for arbitrarily oriented subspace clusterings namely the normalized projected energy (NPE) and subspace compactness score (SCS). The results from the experiments show that especially NPE is capable of assessing clusterings by considering archetypical properties of arbitrarily oriented subspace clustering.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[127]

D. Kazempour, L. M. Yan, P. Kröger and T. Seidl.
You see a set of wagons - I see one train: Towards a unified view of local and global arbitrarily oriented subspace clusters.
ICDMW 2020 - IEEE International Conference on Data Mining Workshops. Sorrento, Italy, Nov 17-20, 2020. DOI

Abstract

Having data with a high number of features raises the need to detect clusters which exhibit within subspaces of features a high similarity. These subspaces can be arbitrarily oriented which gave rise to arbitrarily-oriented subspace clustering (AOSC) algorithms. In the diversity of such algorithms some are specialized at detecting clusters which are global, across the entire dataset regardless of any distances, while others are tailored at detecting local clusters. Both of these views (local and global) are obtained separately by each of the algorithms. While from an algebraic point of view, none of both representations can claim to be the true one, it is vital that domain scientists are presented both views, enabling them to inspect and decide which of the representations is closest to the domain specific reality. We propose in this work a framework which is capable to detect locally dense arbitrarily oriented subspace clusters which are embedded within a global one. We also first introduce definitions of locally and globally arbitrarily oriented subspace clusters. Our experiments illustrate that this approach has no significant impact on the cluster quality nor on the runtime performance, and enables scientists to be no longer limited exclusively to either of the local or global views.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining

[126]

N. Kassner and H. Schütze.
BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA.
EMNLP 2020 - Findings of the Conference on Empirical Methods in Natural Language Processing. Virtual, Nov 16-20, 2020. DOI

Abstract

Khandelwal et al. (2020) use a k-nearest-neighbor (kNN) component to improve language model performance. We show that this idea is beneficial for open-domain question answering (QA). To improve the recall of facts encountered during training, we combine BERT (Devlin et al., 2019) with a traditional information retrieval step (IR) and a kNN search over a large datastore of an embedded text collection. Our contributions are as follows: i) BERT-kNN outperforms BERT on cloze-style QA by large margins without any further training. ii) We show that BERT often identifies the correct response category (e.g., US city), but only kNN recovers the factually correct answer (e.g.,“Miami”). iii) Compared to BERT, BERT-kNN excels for rare facts. iv) BERT-kNN can easily handle facts not covered by BERT’s training set, e.g., recent events.

MCML Authors

Nora Kassner

Dr.

* Former Member

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[125]

N.-K. Chlis, L. Rausch, T. Brocker, J. Kranich and F. J. Theis.
Predicting single-cell gene expression profiles of imaging flow cytometry data with machine learning.
Nucleic Acids Research 48.20 (Nov. 2020). DOI

Abstract

High-content imaging and single-cell genomics are two of the most prominent high-throughput technologies for studying cellular properties and functions at scale. Recent studies have demonstrated that information in large imaging datasets can be used to estimate gene mutations and to predict the cell-cycle state and the cellular decision making directly from cellular morphology. Thus, high-throughput imaging methodologies, such as imaging flow cytometry can potentially aim beyond simple sorting of cell-populations. We introduce IFC-seq, a machine learning methodology for predicting the expression profile of every cell in an imaging flow cytometry experiment. Since it is to-date unfeasible to observe single-cell gene expression and morphology in flow, we integrate uncoupled imaging data with an independent transcriptomics dataset by leveraging common surface markers. We demonstrate that IFC-seq successfully models gene expression of a moderate number of key gene-markers for two independent imaging flow cytometry datasets: (i) human blood mononuclear cells and (ii) mouse myeloid progenitor cells. In the case of mouse myeloid progenitor cells IFC-seq can predict gene expression directly from brightfield images in a label-free manner, using a convolutional neural network. The proposed method promises to add gene expression information to existing and new imaging flow cytometry datasets, at no additional cost.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Modelling of Biological Systems

[124]

A. Agrawal, F. Pfisterer, B. Bischl, F. Buet-Golfouse, S. Sood, J. Chen, S. Shah and S. Vollmer.
Debiasing classifiers: is reality at variance with expectation?
Preprint (Nov. 2020). arXiv

Abstract

We present an empirical study of debiasing methods for classifiers, showing that debiasers often fail in practice to generalize out-of-sample, and can in fact make fairness worse rather than better. A rigorous evaluation of the debiasing treatment effect requires extensive cross-validation beyond what is usually done. We demonstrate that this phenomenon can be explained as a consequence of bias-variance trade-off, with an increase in variance necessitated by imposing a fairness constraint. Follow-up experiments validate the theoretical prediction that the estimation variance depends strongly on the base rates of the protected class. Considering fairness–performance trade-offs justifies the counterintuitive notion that partial debiasing can actually yield better results in practice on out-of-sample data.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Statistical Learning and Data Science

[123]

V. Melnychuk, E. Faerman, I. Manakov and T. Seidl.
Matching the Clinical Reality: Accurate OCT-Based Diagnosis From Few Labels.
CIKMW @CIKM 2020 - Workshop at the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020). Galway, Ireland, Oct 19-23, 2020. PDF GitHub

Abstract

Unlabeled data is often abundant in the clinic, making machine learning methods based on semi-supervised learning a good match for this setting. Despite this, they are currently receiving relatively little attention in medical image analysis literature. Instead, most practitioners and researchers focus on supervised or transfer learning approaches. The recently proposed Mix-Match and FixMatch algorithms have demonstrated promising results in extracting useful representations while requiring very few labels. Motivated by these recent successes, we apply MixMatch and FixMatch in an ophthalmological diagnostic setting and investigate how they fare against standard transfer learning. We find that both algorithms outperform the transfer learning baseline on all fractions of labelled data. Furthermore, our experiments show that Mean Teacher, which is a component of both algorithms, is not needed for our classification problem, as disabling it leaves the outcome unchanged.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Evgeny Faerman

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[122]

Y. Ma, Z. Han and V. Tresp.
Learning with Temporal Knowledge Graphs.
CIKMW @CIKM 2020 - Workshop at the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020). Galway, Ireland, Oct 19-23, 2020. Invited talk. PDF

Abstract

Temporal knowledge graphs, also known as episodic or time-dependent knowledge graphs, are large-scale event databases that describe temporally evolving multi-relational data. An episodic knowledge graph can be regarded as a sequence of semantic knowledge graphs incorporated with timestamps. In this talk, we review recently developed learning-based algorithms for temporal knowledge graphs completion and forecasting.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[121]

S. Denner, A. Khakzar, M. Sajid, M. Saleh, Z. Spiclin, S. T. Kim and N. Navab.
Spatio-temporal learning from longitudinal data for multiple sclerosis lesion segmentation.
BrainLes @MICCAI 2020 - Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2020). Virtual, Oct 04-08, 2020. DOI GitHub

Abstract

Segmentation of Multiple Sclerosis (MS) lesions in longitudinal brain MR scans is performed for monitoring the progression of MS lesions. We hypothesize that the spatio-temporal cues in longitudinal data can aid the segmentation algorithm. Therefore, we propose a multi-task learning approach by defining an auxiliary self-supervised task of deformable registration between two time-points to guide the neural network toward learning from spatio-temporal changes. We show the efficacy of our method on a clinical dataset comprised of 70 patients with one follow-up study for each patient. Our results show that spatio-temporal information in longitudinal data is a beneficial cue for improving segmentation. We improve the result of current state-of-the-art by 2.6% in terms of overall score (p < 0.05).

MCML Authors

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[120]

Y. Yeganeh, A. Farshad, N. Navab and S. Albarqouni.
Inverse Distance Aggregation for Federated Learning with Non-IID Data.
DART DCL @MICCAI 2020 - Workshop on Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2020). Virtual, Oct 04-08, 2020. DOI

Abstract

Federated learning (FL) has been a promising approach in the field of medical imaging in recent years. A critical problem in FL, specifically in medical scenarios is to have a more accurate shared model which is robust to noisy and out-of distribution clients. In this work, we tackle the problem of statistical heterogeneity in data for FL which is highly plausible in medical data where for example the data comes from different sites with different scanner settings. We propose IDA (Inverse Distance Aggregation), a novel adaptive weighting approach for clients based on meta-information which handles unbalanced and non-iid data. We extensively analyze and evaluate our method against the well-known FL approach, Federated Averaging as a baseline.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[119]

T. Seidl.
Keynote: Data Mining on Process Data.
ICPM 2020 - 2nd International Conference on Process Mining. Virtual, Oct 04-09, 2020. DOI

Abstract

Data Mining and Process Mining – is one just a variant of the other, or do worlds separate the two areas from each other? The notions sound so similar but the contents sometimes look differently, so respective researchers may get confused in their mutual perception, be it authors or reviewers. The talk recalls commonalities like model-based supervised and unsupervised learning approaches, and it also sheds light to peculiarities in process data and process mining tasks as seen from a data mining perspective. When considering trace data from event log files as time series, as sequences, or as activity sets, quite different data mining techniques apply and may be extended and improved. A particular example is rare pattern mining, which fills a gap between frequent patterns and outlier detection. The task aims at identifying patterns that occur with low frequency but above single outliers. Structural deficiences may cause malfunctions or other undesired behavior which get discarded as outliers in event logs, since they are observed infrequently only. Rare pattern mining may identify these situations, and recent approaches include clustering or ordering non-conformant traces. The talk concludes with some remarks on how to sell process mining papers to the data mining community, and vice versa, in order to improve mutual acceptance, and to increase synergies in the fields.

MCML Authors

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[118]

A. Maldonado, J. Sontheim, F. Richter and T. Seidl.
Performance Skyline: Inferring Process Performance Models from Interval Events.
SA4PM @ICPM 2020 - 1st International Workshop on Streaming Analytics for Process Mining in conjunction with the 2nd International Conference on Process Mining (ICPM 2020). Virtual, Oct 04-09, 2020. DOI

Abstract

Performance mining from event logs is a central task in managing and optimizing business processes. Established analysis techniques work with a single timestamp per event only. However, when available, time interval information enables proper analysis of the duration of individual activities as well as the overall execution runtime. Our novel approach, performance skyline, considers extended events, including start and end timestamps in log files, aiming at the discovery of events that are crucial to the overall duration of real process executions. As first contribution, our method gains a geometrical process representation for traces with interval events by using interval-based methods from sequence pattern mining and performance analysis. Secondly, we introduce the performance skyline, which discovers dominating events considering a given heuristic in this case, event duration. As a third contribution, we propose three techniques for statistical analysis of performance skylines and process trace sets, enabling more accurate process discovery, conformance checking, and process enhancement. Experiments on real event logs demonstrate that our contributions are highly suitable for detecting and analyzing the dominant events of a process.

MCML Authors

Andrea Maldonado

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[117]

A.-L. Boulesteix, S. Hoffmann, A. Charlton and H. Seibold.
A replication crisis in methodological research?
Significance 17.5 (Oct. 2020). DOI

Abstract

Statisticians have been keen to critique statistical aspects of the enquote{replication crisis} in other scientific disciplines. But new statistical tools are often published and promoted without any thought to replicability. This needs to change, argue Anne-Laure Boulesteix, Sabine Hoffmann, Alethea Charlton and Heidi Seibold.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[116]

P. F. M. Baumann, T. Hothorn and D. Rügamer.
Deep Conditional Transformation Models.
Preprint (Oct. 2020). arXiv

Abstract

Learning the cumulative distribution function (CDF) of an outcome variable conditional on a set of features remains challenging, especially in high-dimensional settings. Conditional transformation models provide a semi-parametric approach that allows to model a large class of conditional CDFs without an explicit parametric distribution assumption and with only a few parameters. Existing estimation approaches within this class are, however, either limited in their complexity and applicability to unstructured data sources such as images or text, lack interpretability, or are restricted to certain types of outcomes. We close this gap by introducing the class of deep conditional transformation models which unifies existing approaches and allows to learn both interpretable (non-)linear model terms and more complex neural network predictors in one holistic framework. To this end we propose a novel network architecture, provide details on different model definitions and derive suitable constraints as well as network regularization terms. We demonstrate the efficacy of our approach through numerical experiments and applications.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[115]

G. Fabbro, V. Golkov, T. Kemp and D. Cremers.
Speech Synthesis and Control Using Differentiable DSP.
Preprint (Oct. 2020). arXiv

Abstract

Modern text-to-speech systems are able to produce natural and high-quality speech, but speech contains factors of variation (e.g. pitch, rhythm, loudness, timbre) that text alone cannot contain. In this work we move towards a speech synthesis system that can produce diverse speech renditions of a text by allowing (but not requiring) explicit control over the various factors of variation. We propose a new neural vocoder that offers control of such factors of variation. This is achieved by employing differentiable digital signal processing (DDSP) (previously used only for music rather than speech), which exposes these factors of variation. The results show that the proposed approach can produce natural speech with realistic timbre, and individual factors of variation can be freely controlled.

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[114]

D. Rügamer, F. Pfisterer and B. Bischl.
Neural Mixture Distributional Regression.
Preprint (Oct. 2020). arXiv

Abstract

We present neural mixture distributional regression (NMDR), a holistic framework to estimate complex finite mixtures of distributional regressions defined by flexible additive predictors. Our framework is able to handle a large number of mixtures of potentially different distributions in high-dimensional settings, allows for efficient and scalable optimization and can be applied to recent concepts that combine structured regression models with deep neural networks. While many existing approaches for mixture models address challenges in optimization of such and provide results for convergence under specific model assumptions, our approach is assumption-free and instead makes use of optimizers well-established in deep learning. Through extensive numerical experiments and a high-dimensional deep learning application we provide evidence that the proposed approach is competitive to existing approaches and works well in more complex scenarios.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[113]

A. Beer, D. Seeholzer, N. S. Schüler and T. Seidl.
Angle-Based Clustering.
SISAP 2020 - 13th International Conference on Similarity Search and Applications. Virtual, Sep 30-Oct 02, 2020. DOI

Abstract

The amount of data increases steadily, and yet most clustering algorithms perform complex computations for every single data point. Furthermore, Euclidean distance which is used for most of the clustering algorithms is often not the best choice for datasets with arbitrarily shaped clusters or such with high dimensionality. Based on ABOD, we introduce ABC, the first angle-based clustering method. The algorithm first identifies a small part of the data as border points of clusters based on the angle between their neighbors. Those few border points can, with some adjustments, be clustered with well-known clustering algorithms like hierarchical clustering with single linkage or DBSCAN. Residual points can quickly and easily be assigned to the cluster of their nearest border point, so the overall runtime is heavily reduced while the results improve or remain similar.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[112]

Y. Ma.
Learning with relational knowledge in the context of cognition, quantum computing, and causality.
Dissertation 2020. DOI

Abstract

This dissertation explores the use of knowledge graphs, including semantic and episodic graphs, for representing static and evolving human knowledge, and proposes methods for improving knowledge inference. It introduces two quantum machine learning algorithms aimed at speeding up knowledge graph inference, demonstrating significant speedups over classical methods. Additionally, the work addresses causal inference in relational data, specifically in social networks, and proposes causal estimators using graph neural networks to estimate superimposed effects and optimize treatment assignments for network welfare. (Shortened.)

MCML Authors

Yunpu Ma

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[111]

A. Bender, D. Rügamer, F. Scheipl and B. Bischl.
A General Machine Learning Framework for Survival Analysis.
ECML-PKDD 2020 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Virtual, Sep 14-18, 2020. DOI

Abstract

The modeling of time-to-event data, also known as survival analysis, requires specialized methods that can deal with censoring and truncation, time-varying features and effects, and that extend to settings with multiple competing events. However, many machine learning methods for survival analysis only consider the standard setting with right-censored data and proportional hazards assumption. The methods that do provide extensions usually address at most a subset of these challenges and often require specialized software that can not be integrated into standard machine learning workflows directly. In this work, we present a very general machine learning framework for time-to-event analysis that uses a data augmentation strategy to reduce complex survival tasks to standard Poisson regression tasks. This reformulation is based on well developed statistical theory. With the proposed approach, any algorithm that can optimize a Poisson (log-)likelihood, such as gradient boosted trees, deep neural networks, model-based boosting and many more can be used in the context of time-to-event analysis. The proposed technique does not require any assumptions with respect to the distribution of event times or the functional shapes of feature and interaction effects. Based on the proposed framework we develop new methods that are competitive with specialized state of the art approaches in terms of accuracy, and versatility, but with comparatively small investments of programming effort or requirements for specialized methodological know-how.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Fabian Scheipl

PD Dr.

Functional Data Analysis

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[110]

C. Molnar, G. Casalicchio and B. Bischl.
Interpretable Machine Learning -- A Brief History, State-of-the-Art and Challenges.
ECML-PKDD 2020 - Workshops at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Virtual, Sep 14-18, 2020. DOI

Abstract

We present a brief history of the field of interpretable machine learning (IML), give an overview of state-of-the-art interpretation methods and discuss challenges. Research in IML has boomed in recent years. As young as the field is, it has over 200 years old roots in regression modeling and rule-based machine learning, starting in the 1960s. Recently, many new IML methods have been proposed, many of them model-agnostic, but also interpretation techniques specific to deep learning and tree-based ensembles. IML methods either directly analyze model components, study sensitivity to input perturbations, or analyze local or global surrogate approximations of the ML model. The field approaches a state of readiness and stability, with many methods not only proposed in research, but also implemented in open-source software. But many important challenges remain for IML, such as dealing with dependent features, causal interpretation, and uncertainty estimation, which need to be resolved for its successful application to scientific problems. A further challenge is a missing rigorous definition of interpretability, which is accepted by the community. To address the challenges and advance the field, we urge to recall our roots of interpretable, data-driven modeling in statistics and (rule-based) ML, but also to consider other areas such as sensitivity analysis, causal inference, and the social sciences.

MCML Authors

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[109]

A. Beer, D. Kazempour, J. Busch, A. Tekles and T. Seidl.
Grace - Limiting the Number of Grid Cells for Clustering High-Dimensional Data.
LWDA 2020 - Conference on Lernen. Wissen. Daten. Analysen. Bonn, Germany, Sep 09-11, 2020. PDF

Abstract

Using grid-based clustering algorithms on high-dimensionaldata has the advantage of being able to summarize datapoints into cells, but usually produces an exponential number of grid cells. In this paper we introduce Grace (using textit{Gr}id which is textit{a}daptive for textit{c}lusttextit{e}ring), a clustering algorithm which limits the number of cells produced depending on the number of points in the dataset. A non-equidistant grid is constructed based on the distribution of points in one-dimensional projections of the data. A density threshold is automatically deduced from the data and used to detect dense cells, which are later combined to clusters. The adaptive grid structure makes an efficient but still accurate clustering of multidimensional data possible. Experiments with synthetic as well as real-world data sets of various size and dimensionality confirm these properties.

MCML Authors

Anna Beer

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[108]

S. Dandl, C. Molnar, M. Binder and B. Bischl.
Multi-Objective Counterfactual Explanations.
PPSN 2020 - 16th International Conference on Parallel Problem Solving from Nature. Leiden, Netherlands, Sep 05-09, 2020. DOI

Abstract

Counterfactual explanations are one of the most popular methods to make predictions of black box machine learning models interpretable by providing explanations in the form of ‘what-if scenarios’. Most current approaches optimize a collapsed, weighted sum of multiple objectives, which are naturally difficult to balance a-priori. We propose the Multi-Objective Counterfactuals (MOC) method, which translates the counterfactual search into a multi-objective optimization problem. Our approach not only returns a diverse set of counterfactuals with different trade-offs between the proposed objectives, but also maintains diversity in feature space. This enables a more detailed post-hoc analysis to facilitate better understanding and also more options for actionable user responses to change the predicted outcome. Our approach is also model-agnostic and works for numerical and categorical input features. We show the usefulness of MOC in concrete cases and compare our approach with state-of-the-art methods for counterfactual explanations.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Statistical Learning and Data Science

[107]

C. Plant, S. Biedermann and C. Böhm.
Data Compression as a Comprehensive Framework for Graph Drawing and Representation Learning.
KDD 2020 - 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, California, USA, Aug 23-27, 2020. DOI

Abstract

Embedding a graph into feature space is a promising approach to understand its structure. Embedding into 2D or 3D space enables visualization; representation in higher-dimensional vector space (typically >100D) enables the application of data mining techniques. For the success of knowledge discovery it is essential that the distances between the embedded vertices truly reflect the structure of the graph. Our fundamental idea is to compress the adjacency matrix by predicting the existence of an edge from the Euclidean distance between the corresponding vertices in the embedding, and to use the achieved compression as a quality measure for the embedding. We call this quality measure Predictive Entropy (PE). PE uses a sigmoid function to define the probability which is monotonically decreasing with the Euclidean distance. We use this sigmoid probability to compress the adjacency matrix of the graph by an entropy coding. While PE could be used to assess the result of any graph drawing or representation learning method we particularly use it as objective function in our new method GEMPE (Graph Embedding by Minimizing the Predictive Entropy). We demonstrate in our experiments that GEMPE clearly outperforms comparison methods with respect to quality of the visual result, clustering and node-labeling accuracy on the discovered coordinates.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[106]

D. Zügner and S. Günnemann.
Certifiable Robustness of Graph Convolutional Networks under Structure Perturbation.
KDD 2020 - 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, California, USA, Aug 23-27, 2020. DOI

Abstract

Recent works show that message-passing neural networks (MPNNs) can be fooled by adversarial attacks on both the node attributes and the graph structure. Since MPNNs are currently being rapidly adopted in real-world applications, it is thus crucial to improve their reliablility and robustness. While there has been progress on robustness certification of MPNNs under perturbation of the node attributes, no existing method can handle structural perturbations. These perturbations are especially challenging because they alter the message passing scheme itself. In this work we close this gap and propose the first method to certify robustness of Graph Convolutional Networks (GCNs) under perturbations of the graph structure. We show how this problem can be expressed as a jointly constrained bilinear program - a challenging, yet well-studied class of problems - and propose a novel branch-and-bound algorithm to obtain lower bounds on the global optimum. These lower bounds are significantly tighter and can certify up to twice as many nodes compared to a standard linear relaxation.

MCML Authors

Daniel Zügner

Dr.

* Former Member

Stephan Günnemann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Data Analytics & Machine Learning

[105]

M. Herrmann, P. Probst, R. Hornung, V. Jurinovic and A.-L. Boulesteix.
Large-scale benchmark study of survival prediction methods using multi-omics data.
Briefings in Bioinformatics (Aug. 2020). DOI

Abstract

Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database ‘The Cancer Genome Atlas’ (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan–Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno’s C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups—especially clinical variables—from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Roman Hornung

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[104]

D. S. Fischer, Y. Wu, B. Schubert and F. J. Theis.
Predicting antigen specificity of single T cells based on TCR CDR3 regions.
Molecular Systems Biology 16.8 (Aug. 2020). DOI

Abstract

It has recently become possible to simultaneously assay T-cell specificity with respect to large sets of antigens and the T-cell receptor sequence in high-throughput single-cell experiments. Leveraging this new type of data, we propose and benchmark a collection of deep learning architectures to model T-cell specificity in single cells. In agreement with previous results, we found that models that treat antigens as categorical outcome variables outperform those that model the TCR and antigen sequence jointly. Moreover, we show that variability in single-cell immune repertoire screens can be mitigated by modeling cell-specific covariates. Lastly, we demonstrate that the number of bound pMHC complexes can be predicted in a continuous fashion providing a gateway to disentangle cell-to-dextramer binding strength and receptor-to-pMHC affinity. We provide these models in the Python package TcellMatch to allow imputation of antigen specificities in single-cell RNA-seq studies on T cells without the need for MHC staining.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[103]

V. Bergen, M. Lange, S. Peidli, F. A. Wolf and F. J. Theis.
Generalizing RNA velocity to transient cell states through dynamical modeling.
Nature Biotechnology 38 (Aug. 2020). DOI

Abstract

RNA velocity has opened up new ways of studying cellular differentiation in single-cell RNA-sequencing data. It describes the rate of gene expression change for an individual gene at a given time point based on the ratio of its spliced and unspliced messenger RNA (mRNA). However, errors in velocity estimates arise if the central assumptions of a common splicing rate and the observation of the full splicing dynamics with steady-state mRNA levels are violated. Here we present scVelo, a method that overcomes these limitations by solving the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to systems with transient cell states, which are common in development and in response to perturbations. We apply scVelo to disentangling subpopulation kinetics in neurogenesis and pancreatic endocrinogenesis. We infer gene-specific rates of transcription, splicing and degradation, recover each cell’s position in the underlying differentiation processes and detect putative driver genes. scVelo will facilitate the study of lineage decisions and gene regulation.

MCML Authors

Marius Lange

Dr.

* Former Member

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Mathematical Modelling of Biological Systems

[102]

C. Fritz, M. Lebacher and G. Kauermann.
Tempus volat, hora fugit: A survey of tie-oriented dynamic network models in discrete and continuous time.
Statistica Neerlandica 74.3 (Aug. 2020). DOI

Abstract

Given the growing number of available tools for modeling dynamic networks, the choice of a suitable model becomes central. The goal of this survey is to provide an overview of tie-oriented dynamic network models. The survey is focused on introducing binary network models with their corresponding assumptions, advantages, and shortfalls. The models are divided according to generating processes, operating in discrete and continuous time. First, we introduce the temporal exponential random graph model (TERGM) and the separable TERGM (STERGM), both being time-discrete models. These models are then contrasted with continuous process models, focusing on the relational event model (REM). We additionally show how the REM can handle time-clustered observations, that is, continuous-time data observed at discrete time points. Besides the discussion of theoretical properties and fitting procedures, we specifically focus on the application of the models on two networks that represent international arms transfers and email exchange, respectively. The data allow to demonstrate the applicability and interpretation of the network models.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[101]

C. Böhm.
Space-filling Curves for High-performance Data Mining.
Preprint (Aug. 2020). arXiv

Abstract

Space-filling curves like the Hilbert-curve, Peano-curve and Z-order map natural or real numbers from a two or higher dimensional space to a one dimensional space preserving locality. They have numerous applications like search structures, computer graphics, numerical simulation, cryptographics and can be used to make various algorithms cache-oblivious. In this paper, we describe some details of the Hilbert-curve. We define the Hilbert-curve in terms of a finite automaton of Mealy-type which determines from the two-dimensional coordinate space the Hilbert order value and vice versa in a logarithmic number of steps. And we define a context-free grammar to generate the whole curve in a time which is linear in the number of generated coordinate/order value pairs, i.e. a constant time per coordinate pair or order value. We also review two different strategies which enable the generation of curves without the usual restriction to square-like grids where the side-length is a power of two. Finally, we elaborate on a few applications, namely matrix multiplication, Cholesky decomposition, the Floyd-Warshall algorithm, k-Means clustering, and the similarity join.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[100]

M. Binder, F. Pfisterer and B. Bischl.
Collecting empirical data about hyperparameters for data driven AutoML.
AutoML @ICML 2020 - 7th Workshop on Automated Machine Learning co-located with ICML 2020. Virtual, Jul 18, 2020. PDF

Abstract

All optimization needs some kind of prior over the functions it is optimizing over. We used a large computing cluster to collect empirical data about the behavior of ML performance, by randomly sampling hyperparameter values and performing cross-validation. We also collected information about cross-validation error by performing some evaluations multiple times, and information about progression of performance with respect to training data size by performing some evaluations on data subsets. We present how we collected data, make some preliminary analyses on the surrogate models that can be built with them, and give an outlook over interesting analyses this should enable.

MCML Authors

Martin Binder

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[99]

C. Molnar, G. König, J. Herbinger, T. Freiesleben, S. Dandl, C. A. Scholbeck, G. Casalicchio, M. Grosse-Wentrup and B. Bischl.
General Pitfalls of Model-Agnostic Interpretation Methods for Machine Learning Models.
XXAI @ICML 2020 - Workshop on Extending Explainable AI Beyond Deep Models and Classifiers at the 37th International Conference on Machine Learning (ICML 2020). Virtual, Jul 12-18, 2020. DOI

Abstract

An increasing number of model-agnostic interpretation techniques for machine learning (ML) models such as partial dependence plots (PDP), permutation feature importance (PFI) and Shapley values provide insightful model interpretations, but can lead to wrong conclusions if applied incorrectly. We highlight many general pitfalls of ML model interpretation, such as using interpretation techniques in the wrong context, interpreting models that do not generalize well, ignoring feature dependencies, interactions, uncertainty estimates and issues in high-dimensional settings, or making unjustified causal interpretations, and illustrate them with examples. We focus on pitfalls for global methods that describe the average model behavior, but many pitfalls also apply to local methods that explain individual predictions. Our paper addresses ML practitioners by raising awareness of pitfalls and identifying solutions for correct model interpretation, but also addresses ML researchers by discussing open issues for further research.

MCML Authors

Gunnar König

Dr.

* Former Member

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[98]

M. Binder, J. Moosbauer, J. Thomas and B. Bischl.
Multi-Objective Hyperparameter Tuning and Feature Selection Using Filter Ensembles.
GECCO 2020 - Genetic and Evolutionary Computation Conference. Cancun, Mexico, Jul 08-12, 2020. DOI

Abstract

Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield better model interpretability and lower cost of data acquisition, data handling and model inference. While sparsity may have a beneficial or detrimental effect on predictive performance, a small drop in performance may be acceptable in return for a substantial gain in sparseness. We therefore treat feature selection as a multi-objective optimization task. We perform hyperparameter tuning and feature selection simultaneously because the choice of features of a model may influence what hyperparameters perform well. We present, benchmark, and compare two different approaches for multi-objective joint hyperparameter optimization and feature selection: The first uses multi-objective model-based optimization. The second is an evolutionary NSGA-II-based wrapper approach to feature selection which incorporates specialized sampling, mutation and recombination operators. Both methods make use of parameterized filter ensembles. While model-based optimization needs fewer objective evaluations to achieve good performance, it incurs computational overhead compared to the NSGA-II, so the preferred choice depends on the cost of evaluating a model on given data.

MCML Authors

Martin Binder

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[97]

A. Beer, V. Hartmann and T. Seidl.
Orderings of Data - more than a Tripping Hazard.
SSDBM 2020 - 32nd International Conference on Scientific and Statistical Database Management. Vienna, Austria, Jul 07-09, 2020. DOI

Abstract

As data processing techniques get more and more sophisticated every day, many of us researchers often get lost in the details and subtleties of the algorithms we are developing and far too easily seem to forget to look also at the very first steps of every algorithm: the input of the data. Since there are plenty of library functions for this task, we indeed do not have to think about this part of the pipeline anymore. But maybe we should. All data is stored and loaded into a program in some order. In this vision paper we study how ignoring this order can (1) lead to performance issues and (2) make research results unreproducible. We furthermore examine desirable properties of a data ordering and why current approaches are often not suited to tackle the two mentioned problems.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining

[96]

N. Kassner and H. Schütze.
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly.
ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics. Virtual, Jul 05-10, 2020. DOI

Abstract

Building on Petroni et al. 2019, we propose two new probing tasks analyzing factual knowledge stored in Pretrained Language Models (PLMs). (1) Negation. We find that PLMs do not distinguish between negated (‘‘Birds cannot [MASK]”) and non-negated (‘‘Birds can [MASK]”) cloze questions. (2) Mispriming. Inspired by priming methods in human psychology, we add “misprimes” to cloze questions (‘‘Talk? Birds can [MASK]”). We find that PLMs are easily distracted by misprimes. These results suggest that PLMs still have a long way to go to adequately learn human-like factual knowledge.

MCML Authors

Nora Kassner

Dr.

* Former Member

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Computational Linguistics

[95]

D. Mautz, C. Plant and C. Böhm.
DeepECT: The Deep Embedded Cluster Tree.
Data Science and Engineering 5 (Jul. 2020). DOI

Abstract

The idea of combining the high representational power of deep learning techniques with clustering methods has gained much attention in recent years. Optimizing a clustering objective and the dataset representation simultaneously has been shown to be advantageous over separately optimizing them. So far, however, all proposed methods have been using a flat clustering strategy, with the actual number of clusters known a priori. In this paper, we propose the Deep Embedded Cluster Tree (DeepECT), the first divisive hierarchical embedded clustering method. The cluster tree does not need to know the actual number of clusters during optimization. Instead, the level of detail to be analyzed can be chosen afterward and for each sub-tree separately. An optional data-augmentation-based extension allows DeepECT to ignore prior-known invariances of the dataset, such as affine transformations in image data. We evaluate and show the advantages of DeepECT in extensive experiments.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[94]

N. Ellenbach, A.-L. Boulesteix, B. Bischl, K. Unger and R. Hornung.
Improved outcome prediction across data sources through robust parameter tuning.
Journal of Classification (Jul. 2020). DOI

Abstract

In many application areas, prediction rules trained based on high-dimensional data are subsequently applied to make predictions for observations from other sources, but they do not always perform well in this setting. This is because data sets from different sources can feature (slightly) differing distributions, even if they come from similar populations. In the context of high-dimensional data and beyond, most prediction methods involve one or several tuning parameters. Their values are commonly chosen by maximizing the cross-validated prediction performance on the training data. This procedure, however, implicitly presumes that the data to which the prediction rule will be ultimately applied, follow the same distribution as the training data. If this is not the case, less complex prediction rules that slightly underfit the training data may be preferable. Indeed, a tuning parameter does not only control the degree of adjustment of a prediction rule to the training data, but also, more generally, the degree of adjustment to the distribution of the training data. On the basis of this idea, in this paper we compare various approaches including new procedures for choosing tuning parameter values that lead to better generalizing prediction rules than those obtained based on cross-validation. Most of these approaches use an external validation data set. In our extensive comparison study based on a large collection of 15 transcriptomic data sets, tuning on external data and robust tuning with a tuned robustness parameter are the two approaches leading to better generalizing prediction rules.

MCML Authors

Nicole Ellenbach

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Learning and Data Science

Roman Hornung

Dr.

Biometry in Molecular Medicine

[93]

J. Kranich, N.-K. Chlis, L. Rausch, A. Latha, M. Schifferer, T. Kurz, A. F.-A. Kia, a. Simons, F. J. Theis and T. Brocker.
In vivo identification of apoptotic and extracellular vesicle-bound live cells using image-based deep learning.
Journal of Extracellular Vesicles 9.1 (Jul. 2020). DOI

Abstract

The in vivo detection of dead cells remains a major challenge due to technical hurdles. Here, we present a novel method, where injection of fluorescent milk fat globule-EGF factor 8 protein (MFG-E8) in vivo combined with imaging flow cytometry and deep learning allows the identification of dead cells based on their surface exposure of phosphatidylserine (PS) and other image parameters. A convolutional autoencoder (CAE) was trained on defined pictures and successfully used to identify apoptotic cells in vivo. However, unexpectedly, these analyses also revealed that the great majority of PS+ cells were not apoptotic, but rather live cells associated with PS+ extracellular vesicles (EVs). During acute viral infection apoptotic cells increased slightly, while up to 30% of lymphocytes were decorated with PS+ EVs of antigen-presenting cell (APC) exosomal origin. The combination of recombinant fluorescent MFG-E8 and the CAE-method will greatly facilitate analyses of cell death and EVs in vivo.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Mathematical Modelling of Biological Systems

[92]

C. Stachl, Q. Au, R. Schoedel, S. D. Gosling, G. M. Harari, D. Buschek, S. T. Völkel, T. Schuwerk, M. Oldemeier, T. Ullmann, H. Hussmann, B. Bischl and M. Bühner.
Predicting personality from patterns of behavior collected with smartphones.
Proceedings of the National Academy of Sciences 117.30 (Jul. 2020). DOI

Abstract

Smartphones enjoy high adoption rates around the globe. Rarely more than an arm’s length away, these sensor-rich devices can easily be repurposed to collect rich and extensive records of their users’ behaviors (e.g., location, communication, media consumption), posing serious threats to individual privacy. Here we examine the extent to which individuals’ Big Five personality dimensions can be predicted on the basis of six different classes of behavioral information collected via sensor and log data harvested from smartphones. Taking a machine-learning approach, we predict personality at broad domain ( = 0.37) and narrow facet levels ( = 0.40) based on behavioral data collected from 624 volunteers over 30 consecutive days (25,347,089 logging events). Our cross-validated results reveal that specific patterns in behaviors in the domains of 1) communication and social behavior, 2) music consumption, 3) app usage, 4) mobility, 5) overall phone activity, and 6) day- and night-time activity are distinctively predictive of the Big Five personality traits. The accuracy of these predictions is similar to that found for predictions based on digital footprints from social media platforms and demonstrates the possibility of obtaining information about individuals’ private traits from behavioral patterns passively collected from their smartphones. Overall, our results point to both the benefits (e.g., in research settings) and dangers (e.g., privacy implications, psychological targeting) presented by the widespread collection and modeling of behavioral data obtained from smartphones.

MCML Authors

Theresa Ullmann

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[91]

M. Lotfollahi, M. Naghipourfar, M. D. Luecken, M. Khajavi, M. Büttner, Z. Avsec, A. V. Misharin and F. J. Theis.
Query to reference single-cell integration with transfer learning.
Preprint (Jul. 2020). DOI

Abstract

Large single-cell atlases are now routinely generated with the aim of serving as reference to analyse future smaller-scale studies. Yet, learning from reference data is complicated by batch effects between datasets, limited availability of computational resources, and sharing restrictions on raw data. Leveraging advances in machine learning, we propose a deep learning strategy to map query datasets on top of a reference called single-cell architectural surgery (scArches, https://github.com/theislab/scarches). It uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building, and the contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, and whole organism atlases, we showcase that scArches preserves nuanced biological state information while removing batch effects in the data, despite using four orders of magnitude fewer parameters compared to de novo integration. To demonstrate mapping disease variation, we show that scArches preserves detailed COVID-19 disease variation upon reference mapping, enabling discovery of new cell identities that are unseen during training. We envision our method to facilitate collaborative projects by enabling the iterative construction, updating, sharing, and efficient use of reference atlases.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[90]

S. Friedl, S. Schmoll, F. Borutta and M. Schubert.
SMART-Env.
MDM 2020 - 21st IEEE International Conference on Mobile Data Management. Versailles, France, Jun 30-Jul 03, 2020. DOI

Abstract

In this work, we present SMART-Env (Spatial Multi-Agent Resource search Training Environment), a spatio-temporal multi-agent environment for evaluating and training different kinds of agents on resource search tasks. We explain how to simulate arbitrary spawning distributions on real-world street graphs, compare agents’ behavior and evaluate their performance over time. Finally, we demonstrate SMART-Env in a taxi dispatching scenario with three different kinds of agents.

MCML Authors

Sabrina Friedl

Dr.

* Former Member

Felix Borutta

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[89]

F. Wimbauer, N. Yang, L. von Stumberg, N. Zeller and D. Cremers.
MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera.
CVPR 2020 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual, Jun 14-19, 2020. DOI GitHub

Abstract

In this paper, we propose MonoRec, a semi-supervised monocular dense reconstruction architecture that predicts depth maps from a single moving camera in dynamic environments. MonoRec is based on a multi-view stereo setting which encodes the information of multiple consecutive images in a cost volume. To deal with dynamic objects in the scene, we introduce a MaskModule that predicts moving object masks by leveraging the photometric inconsistencies encoded in the cost volumes. Unlike other multi-view stereo methods, MonoRec is able to reconstruct both static and moving objects by leveraging the predicted masks. Furthermore, we present a novel multi-stage training scheme with a semi-supervised loss formulation that does not require LiDAR depth values. We carefully evaluate MonoRec on the KITTI dataset and show that it achieves state-of-theart performance compared to both multi-view and singleview methods. With the model trained on KITTI, we furthermore demonstrate that MonoRec is able to generalize well to both the Oxford RobotCar dataset and the more challenging TUM-Mono dataset recorded by a handheld camera.

MCML Authors

Felix Wimbauer

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[88]

M. Ali, C. T. Hoyt, L. Vermue, M. Galkin and M. Berrendorf.
pykeen/benchmarking. Version v1.0.
2020. DOI

Abstract

pykeen/benchmarking: Accompanying arXiv announcement (v1.0). Zenodo. Mehdi Ali, Charles Tapley Hoyt, Laurent Vermue, Michael Galkin, & Max Berrendorf. (2020).

MCML Authors

Max Berrendorf

Dr.

* Former Member

[87]

D. Mautz, W. Ye, C. Plant and C. Böhm.
Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans.
ACM Transactions on Knowledge Discovery from Data 14.5 (Jun. 2020). DOI

Abstract

A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the dataset. The new research field of non-redundant clustering addresses this class of problems. In this article, we follow the approach that different, non-redundant k-means-like clusterings may exist in different, arbitrarily oriented subspaces of the high-dimensional space. We assume that these subspaces (and optionally a further noise space without any cluster structure) are orthogonal to each other. This assumption enables a particularly rigorous mathematical treatment of the non-redundant clustering problem and thus a particularly efficient algorithm, which we call Nr-Kmeans (for non-redundant k-means). The superiority of our algorithm is demonstrated both theoretically, as well as in extensive experiments. Further, we propose an extension of Nr-Kmeans that harnesses Hartigan’s dip test to identify the number of clusters for each subspace automatically.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[86]

A. Beyer, G. Kauermann and H. Schütze.
Embedding Space Correlation as a Measure of Domain Similarity.
LREC 2020 - 12th International Conference on Language Resources and Evaluation. Marseille, France, May 13-15, 2020. URL

Abstract

Prior work has determined domain similarity using text-based features of a corpus. However, when using pre-trained word embeddings, the underlying text corpus might not be accessible anymore. Therefore, we propose the CCA measure, a new measure of domain similarity based directly on the dimension-wise correlations between corresponding embedding spaces. Our results suggest that an inherent notion of domain can be captured this way, as we are able to reproduce our findings for different domain comparisons for English, German, Spanish and Czech as well as in cross-lingual comparisons. We further find a threshold at which the CCA measure indicates that two corpora come from the same domain in a monolingual setting by applying permutation tests. By evaluating the usability of the CCA measure in a domain adaptation application, we also show that it can be used to determine which corpora are more similar to each other in a cross-domain sentiment detection task.

MCML Authors

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

Hinrich Schütze

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

[85]

J. Jungmaier, N. Kassner and B. Roth.
Dirichlet-Smoothed Word Embeddings for Low-Resource Settings.
LREC 2020 - 12th International Conference on Language Resources and Evaluation. Marseille, France, May 13-15, 2020. URL

Abstract

Nowadays, classical count-based word embeddings using positive pointwise mutual information (PPMI) weighted co-occurrence matrices have been widely superseded by machine-learning-based methods like word2vec and GloVe. But these methods are usually applied using very large amounts of text data. In many cases, however, there is not much text data available, for example for specific domains or low-resource languages. This paper revisits PPMI by adding Dirichlet smoothing to correct its bias towards rare words. We evaluate on standard word similarity data sets and compare to word2vec and the recent state of the art for low-resource settings: Positive and Unlabeled (PU) Learning for word embeddings. The proposed method outperforms PU-Learning for low-resource settings and obtains competitive results for Maltese and Luxembourgish.

MCML Authors

Nora Kassner

Dr.

* Former Member

[84]

F. Borutta, D. Kazempour, F. Marty, P. Kröger and T. Seidl.
Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform.
PAKDD 2020 - 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Singapore, May 11-14, 2020. DOI

Abstract

When facing high-dimensional data streams, clustering algorithms quickly reach the boundaries of their usefulness as most of these methods are not designed to deal with the curse of dimensionality. Due to inherent sparsity in high-dimensional data, distances between objects tend to become meaningless since the distances between any two objects measured in the full dimensional space tend to become the same for all pairs of objects. In this work, we present a novel oriented subspace clustering algorithm that is able to deal with such issues and detects arbitrarily oriented subspace clusters in high-dimensional data streams. Data streams generally implicate the challenge that the data cannot be stored entirely and hence there is a general demand for suitable data handling strategies for clustering algorithms such that the data can be processed within a single scan. We therefore propose the CASHSTREAM algorithm that unites state-of-the-art stream processing techniques and additionally relies on the Hough transform to detect arbitrarily oriented subspace clusters. Our experiments compare CASHSTREAM to its static counterpart and show that the amount of consumed memory is significantly decreased while there is no loss in terms of runtime.

MCML Authors

Felix Borutta

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[83]

S. Klau, M.-L. Martin-Magniette, A.-L. Boulesteix and S. Hoffmann.
Sampling uncertainty versus method uncertainty: a general framework with applications to omics biomarker selection.
Biometrical Journal 62.3 (May. 2020). DOI

Abstract

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[82]

K. Baßler, W. Fujii, T. S. Kapellos, A. Horne, B. Reiz, E. Dudkin, M. Lücken, N. Reusch, C. Osei-Sarpong, S. Warnat-Herresthal, A. Wagner, L. Bonaguro, P. Günther, C. Pizarro, T. Schreiber, M. Becker, K. Händler, C. T. Wohnhaas, F. Baumgartner, M. Köhler, H. Theis, M. Kraut, M. H. Wadsworth, T. K. Hughes, H. J. G. Ferreira, J. Schulte-Schrepping, E. Hinkley, I. H. Kaltheuner, M. Geyer, C. Thiele, A. K. Shalek, A. Feißt, D. Thomas, H. Dickten, M. Beyer, P. Baum, N. Yosef, A. C. Aschenbrenner, T. Ulas, J. Hasenauer, F. J. Theis, D. Skowasch and J. L. Schultze.
Alterations of multiple alveolar macrophage states in chronic obstructive pulmonary disease.
Preprint (May. 2020). DOI

Abstract

Despite the epidemics of chronic obstructive pulmonary disease (COPD), the cellular and molecular mechanisms of this disease are far from being understood. Here, we characterize and classify the cellular composition within the alveolar space and peripheral blood of COPD patients and control donors using a clinically applicable single-cell RNA-seq technology corroborated by advanced computational approaches for: machine learning-based cell-type classification, identification of differentially expressed genes, prediction of metabolic changes, and modeling of cellular trajectories within a patient cohort. These high-resolution approaches revealed: massive transcriptional plasticity of macrophages in the alveolar space with increased levels of invading and proliferating cells, loss of MHC expression, reduced cellular motility, altered lipid metabolism, and a metabolic shift reminiscent of mitochondrial dysfunction in COPD patients. Collectively, single-cell omics of multi-tissue samples was used to build the first cellular and molecular framework for COPD pathophysiology as a prerequisite to develop molecular biomarkers and causal therapies against this deadly disease.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[81]

J. Klicpera, J. Groß and S. Günnemann.
Directional Message Passing for Molecular Graphs.
ICLR 2020 - 8th International Conference on Learning Representations. Virtual, Apr 26-May 01, 2020. URL

Abstract

Graph neural networks have recently achieved great successes in predicting quantum mechanical properties of molecules. These models represent a molecule as a graph using only the distance between atoms (nodes). They do not, however, consider the spatial direction from one atom to another, despite directional information playing a central role in empirical potentials for molecules, e.g. in angular potentials. To alleviate this limitation we propose directional message passing, in which we embed the messages passed between atoms instead of the atoms themselves. Each message is associated with a direction in coordinate space. These directional message embeddings are rotationally equivariant since the associated directions rotate with the molecule. We propose a message passing scheme analogous to belief propagation, which uses the directional information by transforming messages based on the angle between them. Additionally, we use spherical Bessel functions and spherical harmonics to construct theoretically well-founded, orthogonal representations that achieve better performance than the currently prevalent Gaussian radial basis representations while using fewer than 1/4 of the parameters. We leverage these innovations to construct the directional message passing neural network (DimeNet). DimeNet outperforms previous GNNs on average by 76% on MD17 and by 31% on QM9. Our implementation is available online.

MCML Authors

Stephan Günnemann

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Data Analytics & Machine Learning

[80]

O. Shchur, M. Biloš and S. Günnemann.
Intensity-Free Learning of Temporal Point Processes.
ICLR 2020 - 8th International Conference on Learning Representations. Virtual, Apr 26-May 01, 2020. Spotlight Presentation. URL

Abstract

Temporal point processes are the dominant paradigm for modeling sequences of events happening at irregular intervals. The standard way of learning in such models is by estimating the conditional intensity function. However, parameterizing the intensity function usually incurs several trade-offs. We show how to overcome the limitations of intensity-based approaches by directly modeling the conditional distribution of inter-event times. We draw on the literature on normalizing flows to design models that are flexible and efficient. We additionally propose a simple mixture model that matches the flexibility of flow-based models, but also permits sampling and computing moments in closed form. The proposed models achieve state-of-the-art performance in standard prediction tasks and are suitable for novel applications, such as learning sequence embeddings and imputing missing data.

MCML Authors

Oleksandr Shchur

Dr.

* Former Member

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[79]

M. Berrendorf, E. Faerman and V. Tresp.
Active Learning for Entity Alignment.
DL4G @WWW 2020 - 5th International Workshop on Deep Learning for Graphs at the International World Wide Web Conference (WWW 2020). Taipeh, Taiwan, Apr 21, 2020. arXiv

Abstract

In this work, we propose a novel framework for the labeling of entity alignments in knowledge graph datasets. Different strategies to select informative instances for the human labeler build the core of our framework. We illustrate how the labeling of entity alignments is different from assigning class labels to single instances and how these differences affect the labeling efficiency. Based on these considerations we propose and evaluate different active and passive learning strategies. One of our main findings is that passive learning approaches, which can be efficiently precomputed and deployed more easily, achieve performance comparable to the active learning strategies.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[78]

M. Berrendorf, E. Faerman, L. Vermue and V. Tresp.
Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank (Extended Abstract).
DL4G @WWW 2020 - 5th International Workshop on Deep Learning for Graphs at the International World Wide Web Conference (WWW 2020). Taipeh, Taiwan, Apr 21, 2020. Full paper at WI-AT 2020. DOI

Abstract

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[77]

M. C. Altinigneli, L. Miklautz, C. Böhm and C. Plant.
Hierarchical Quick Shift Guided Recurrent Clustering.
ICDE 2020 - 36th IEEE International Conference on Data Engineering. Dallas, TX, USA, Apr 20-24, 2020. DOI

Abstract

We propose a novel density-based mode-seeking Hierarchical Quick Shift clustering algorithm with an optional Recurrent Neural Network (RNN) to jointly learn the cluster assignments for every sample and the underlying dynamics of the mode-seeking clustering process. As a mode-seeking clustering algorithm, Hierarchical Quick Shift constrains data samples to stay on similar trajectories. All data samples converging to the same local mode are assigned to a common cluster. The RNN enables us to learn quasi-temporal structures during the mode-seeking clustering process. It supports variable density clusters with arbitrary shapes without requiring the expected number of clusters a priori. We evaluate our method in extensive experiments to show the advantages over other density-based clustering algorithms.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[76]

M. Berrendorf, E. Faerman, V. Melnychuk, V. Tresp and T. Seidl.
Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned.
ECIR 2020 - 42nd European Conference on Information Retrieval. Virtual, Apr 14-17, 2020. DOI GitHub

Abstract

In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fully reproduce the results from the original paper and after a thorough audit of the code provided by authors, we concluded, that their implementation is different from the architecture described in the paper. In addition, several tricks are required to make the model work and some of them are not very intuitive.We provide an extensive ablation study to quantify the effects these tricks and changes of architecture have on final performance. Furthermore, we examine current evaluation approaches and systematize available benchmark datasets.We believe that people interested in KG matching might profit from our work, as well as novices entering the field.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[75]

F. Borutta.
Unsupervised learning on social data.
Dissertation 2020. DOI

Abstract

This thesis addresses several challenges in social data analytics, focusing on methods for clustering, learning from network data, and analyzing dynamic social data. It introduces novel algorithms for correlation clustering on streaming data, hierarchical clustering for social maps, and user identification based on spatio-temporal mobility patterns. Additionally, the thesis presents various node embedding techniques for learning representations from network topology and proposes a graph neural network model for matching nodes across overlapping graphs. (Shortened.)

MCML Authors

Felix Borutta

Dr.

* Former Member

[74]

L. Miklautz, D. Mautz, M. C. Altinigneli, C. Böhm and C. Plant.
Deep embedded non-redundant clustering.
AAAI 2020 - 34th Conference on Artificial Intelligence. New York City, New York, USA, Feb 07-12, 2020. DOI

Abstract

Complex data types like images can be clustered in multiple valid ways. Non-redundant clustering aims at extracting those meaningful groupings by discouraging redundancy between clusterings. Unfortunately, clustering images in pixel space directly has been shown to work unsatisfactory. This has increased interest in combining the high representational power of deep learning with clustering, termed deep clustering. Algorithms of this type combine the non-linear embedding of an autoencoder with a clustering objective and optimize both simultaneously. None of these algorithms try to find multiple non-redundant clusterings. In this paper, we propose the novel Embedded Non-Redundant Clustering algorithm (ENRC). It is the first algorithm that combines neural-network-based representation learning with non-redundant clustering. ENRC can find multiple highly non-redundant clusterings of different dimensionalities within a data set. This is achieved by (softly) assigning each dimension of the embedded space to the different clusterings. For instance, in image data sets it can group the objects by color, material and shape, without the need for explicit feature engineering. We show the viability of ENRC in extensive experiments and empirically demonstrate the advantage of combining non-linear representation learning with non-redundant clustering.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[73]

S. Sachs, A. Bastidas-Ponce, S. Tritschler, M. Bakhti, A. Böttcher, M. A. Sánchez-Garrido, M. Tarquis-Medina, M. Kleinert, K. Fischer, S. Jall, A. Harger, E. Bader, S. Roscioni, S. Ussar, A. Feuchtinger, B. Yesildag, A. Neelakandhan, C. B. Jensen, M. Cornu, B. Yang, B. Finan, R. D. DiMarchi, M. H. T. Matthias H. Tschöp, F. J. Theis, S. M. Hofmann, T. D. Müller and H. Lickert.
Targeted pharmacological therapy restores β-cell function for diabetes remission.
Nature Metabolism 2 (Feb. 2020). DOI

Abstract

Dedifferentiation of insulin-secreting β cells in the islets of Langerhans has been proposed to be a major mechanism of β-cell dysfunction. Whether dedifferentiated β cells can be targeted by pharmacological intervention for diabetes remission, and ways in which this could be accomplished, are unknown as yet. Here we report the use of streptozotocin-induced diabetes to study β-cell dedifferentiation in mice. Single-cell RNA sequencing (scRNA-seq) of islets identified markers and pathways associated with β-cell dedifferentiation and dysfunction. Single and combinatorial pharmacology further show that insulin treatment triggers insulin receptor pathway activation in β cells and restores maturation and function for diabetes remission. Additional β-cell selective delivery of oestrogen by Glucagon-like peptide-1 (GLP-1–oestrogen conjugate) decreases daily insulin requirements by 60%, triggers oestrogen-specific activation of the endoplasmic-reticulum-associated protein degradation system, and further increases β-cell survival and regeneration. GLP-1–oestrogen also protects human β cells against cytokine-induced dysfunction. This study not only describes mechanisms of β-cell dedifferentiation and regeneration, but also reveals pharmacological entry points to target dedifferentiated β cells for diabetes remission.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Modelling of Biological Systems

[72]

M. Becker, P. Schratz, M. Lang and B. Bischl.
mlr3fselect: Feature Selection for 'mlr3'.
2020. URL

Abstract

Feature selection package of the ‘mlr3’ ecosystem. It selects the optimal feature set for any ‘mlr3’ learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling.

MCML Authors

Marc Becker

Statistical Learning and Data Science

Patrick Schratz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[71]

M. Binder, F. Pfisterer, L. Schneider, B. Bischl, M. Lang and S. Dandl.
mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'.
2020. URL GitHub

Abstract

mlr3pipelines is a dataflow programming toolkit for machine learning in R utilising the mlr3 package. Machine learning workflows can be written as directed “Graphs” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the mlr3tuning package, it is even possible to simultaneously optimize parameters of multiple processing units.

MCML Authors

Martin Binder

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

* Former Member

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[70]

M. Herrmann.
fda-ndr: Unsupervised Functional Data Analysis via Nonlinear Dimension Reduction. R package.
2020. GitHub

Abstract

manifun: Collection of functions to work with embeddings and functional data.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[69]

M. Herrmann.
manifun: Collection of functions to work with embeddings and functional data. R package.
2020. GitHub

Abstract

Repository contains material to reproduce the results of ‘Unsupervised Functional Data Analysis via Nonlinear Dimension Reduction’(https://arxiv.org/abs/2012.11987).

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[68]

M. Lang.
mlr3db: Data Base Backend for 'mlr3'.
2020. URL GitHub

Abstract

Extends the mlr3 package with a DataBackend to transparently work with databases.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[67]

M. Lang.
mlr3oml: Connector Between 'mlr3' and 'OpenML'.
2020. URL GitHub

Abstract

OpenML is an open-source platform that facilitates the sharing and dissemination of machine learning research data. All entities on the platform have unique identifiers and standardized (meta)data that can be accessed via an open-access REST API or the web interface. mlr3oml allows to work with the REST API through R and integrates OpenML with the mlr3 ecosystem.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[66]

M. Lang, Q. Au, S. Coors and P. Schratz.
mlr3learners: Recommended Learners for 'mlr3'.
2020. URL GitHub

Abstract

This packages provides essential learners for mlr3, maintained by the mlr-org team. Additional learners can be found in the mlr3extralearners package on GitHub. Request additional learners over there.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Stefan Coors

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Patrick Schratz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[65]

M. Lang, P. Schratz and R. Sonabend.
mlr3viz: Visualizations for 'mlr3'.
2020. URL GitHub

Abstract

mlr3viz is the visualization package of the mlr3 ecosystem. It features plots for mlr3 objects such as tasks, learners, predictions, benchmark results, tuning instances and filters via the autoplot() generic of ggplot2. The package draws plots with the viridis color palette and applies the minimal theme. Visualizations include barplots, boxplots, histograms, ROC curves, and Precision-Recall curves.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Patrick Schratz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[64]

D. Pulatov and M. Lang.
mlr3cluster: Cluster Extension for 'mlr3'.
2020. URL GitHub

Abstract

mlr3cluster is an extension package for cluster analysis within the mlr3 ecosystem. It is a successor of clustering capabilities of mlr2.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[63]

F. Scheipl, J. Goldsmith and J. Wrobel.
tidyfun: Tools for Tidy Functional Data. R package.
2020. URL GitHub

Abstract

The goal of tidyfun is to provide accessible and well-documented software that makes functional data analysis in R easy – specifically data wrangling and exploratory analysis.

MCML Authors

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Functional Data Analysis

[62]

P. Schratz, M. Lang, B. Bischl and M. Binder.
mlr3filters: Filter Based Feature Selection for 'mlr3'.
2020. URL GitHub

Abstract

mlr3filters adds feature selection filters to mlr3. The implemented filters can be used stand-alone, or as part of a machine learning pipeline in combination with mlr3pipelines and the filter operator.

MCML Authors

Patrick Schratz

* Former Member

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

Statistical Learning and Data Science

[61]

R. Sonabend, F. J. Kiraly, A. Bender, B. Bischl and M. Lang.
mlr3proba: Probabilistic Supervised Learning for 'mlr3'. R package version 0.2.6.
2020. DOI URL

Abstract

As machine learning has become increasingly popular over the last few decades, so too has the number of machine-learning interfaces for implementing these models. Whilst many R libraries exist for machine learning, very few offer extended support for survival analysis. This is problematic considering its importance in fields like medicine, bioinformatics, economics, engineering and more. mlr3proba provides a comprehensive machine-learning interface for survival analysis and connects with mlr3’s general model tuning and benchmarking facilities to provide a systematic infrastructure for survival modelling and evaluation.

MCML Authors

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

* Former Member

[60]

J. Wrobel, A. Bauer, J. McDonnel and F. Scheipl.
registr: Curve Registration for Exponential Family Functional Data. R package.
2020. GitHub

Abstract

Registration for incomplete exponential family functional data.

MCML Authors

Alexander Bauer

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Functional Data Analysis

[59]

M. Urban, K. Heckel, C. Berger, P. Schratz, I. P. Smit, T. Strydom, J. Baade and C. Schmullius.
Woody cover mapping in the savanna ecosystem of the Kruger National Park using Sentinel-1 C-Band time series data.
Koedoe 62.1 (Jan. 2020). DOI

Abstract

The savanna ecosystems in South Africa, which are predominantly characterised by woody vegetation (e.g. shrubs and trees) and grasslands with annual phenological cycles, are shaped by ecosystem processes such as droughts, fires and herbivory interacting with management actions. Therefore, monitoring of the intra- and inter-annual vegetation structure dynamics is one of the essential components for the management of complex savanna ecosystems such as the Kruger National Park (KNP). To map the woody cover in the KNP, data from European Space Agency’s (ESA) Copernicus Sentinel-1 radar satellite (C-Band vertical-vertical [VV]/vertical-horizontal [VH]) for the years 2016 and 2017, at 10 m spatial resolution and repeated acquisitions every 12 days, were utilised. A high-resolution light detection and ranging (LiDAR) data set was reclassified to produce woody cover percentages and consequently used for calibration and validation. Woody cover estimation for different spatial resolutions was carried out by fitting a random forest (RF) model. Model accuracy was assessed via spatial cross-validation and revealed an overall root mean squared error (RMSE) of 22.8% for the product with a spatial resolution of 10 m and improved with spatial averaging to 15.8% for 30 m, 14.8% for 50 m and 13.4% for 100 m. In addition, the product was validated against a second LiDAR data set, confirming the results of the spatial cross-validation of the model. The methodology of this study is designed for savanna vegetation structure mapping based on height estimates by using open-source software and open-access data, to allow for a continuation of woody cover classification and change monitoring in these types of ecosystems.

MCML Authors

Patrick Schratz

* Former Member

[58]

D. Davletshina, V. Melnychuk, V. Tran, H. Singla, M. Berrendorf, E. Faerman, M. Fromm and M. Schubert.
Unsupervised Anomaly Detection for X-Ray Images.
Preprint (Jan. 2020). arXiv GitHub

Abstract

Obtaining labels for medical (image) data requires scarce and expensive experts. Moreover, due to ambiguous symptoms, single images rarely suffice to correctly diagnose a medical condition. Instead, it often requires to take additional background information such as the patient’s medical history or test results into account. Hence, instead of focusing on uninterpretable black-box systems delivering an uncertain final diagnosis in an end-to-end-fashion, we investigate how unsupervised methods trained on images without anomalies can be used to assist doctors in evaluating X-ray images of hands. Our method increases the efficiency of making a diagnosis and reduces the risk of missing important regions. Therefore, we adopt state-of-the-art approaches for unsupervised learning to detect anomalies and show how the outputs of these methods can be explained. To reduce the effect of noise, which often can be mistaken for an anomaly, we introduce a powerful preprocessing pipeline. We provide an extensive evaluation of different approaches and demonstrate empirically that even without labels it is possible to achieve satisfying results on a real-world dataset of X-ray images of hands. We also evaluate the importance of preprocessing and one of our main findings is that without it, most of our approaches perform not better than random.

MCML Authors

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Viet Tran

C2 | Biology
→ Group Christian Müller

Biomedical Statistics and Data Science

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Michael Fromm

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

2019

[57]

M. Biloš, B. Charpentier and S. Günnemann.
Uncertainty on Asynchronous Time Event Prediction.
NeurIPS 2019 - 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 08-14, 2019. URL

Abstract

Asynchronous event sequences are the basis of many applications throughout different industries. In this work, we tackle the task of predicting the next event (given a history), and how this prediction changes with the passage of time. Since at some time points (e.g. predictions far into the future) we might not be able to predict anything with confidence, capturing uncertainty in the predictions is crucial. We present two new architectures, WGP-LN and FD-Dir, modelling the evolution of the distribution on the probability simplex with time-dependent logistic normal and Dirichlet distributions. In both cases, the combination of RNNs with either Gaussian process or function decomposition allows to express rich temporal evolution of the distribution parameters, and naturally captures uncertainty. Experiments on class prediction, time prediction and anomaly detection demonstrate the high performances of our models on various datasets compared to other approaches.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[56]

A. Bojchevski and S. Günnemann.
Certifiable Robustness to Graph Perturbations.
NeurIPS 2019 - 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 08-14, 2019. URL

Abstract

Despite the exploding interest in graph neural networks there has been little effort to verify and improve their robustness. This is even more alarming given recent findings showing that they are extremely vulnerable to adversarial attacks on both the graph structure and the node attributes. We propose the first method for verifying certifiable (non-)robustness to graph perturbations for a general class of models that includes graph neural networks and label/feature propagation. By exploiting connections to PageRank and Markov decision processes our certificates can be efficiently (and under many threat models exactly) computed. Furthermore, we investigate robust training procedures that increase the number of certifiably robust nodes while maintaining or improving the clean predictive accuracy.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[55]

J. Gasteiger, S. Weißenberger and S. Günnemann.
Diffusion Improves Graph Learning.
NeurIPS 2019 - 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 08-14, 2019. URL

Abstract

Graph convolution is the core of most Graph Neural Networks (GNNs) and usually approximated by message passing between direct (one-hop) neighbors. In this work, we remove the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC). GDC leverages generalized graph diffusion, examples of which are the heat kernel and personalized PageRank. It alleviates the problem of noisy and often arbitrarily defined edges in real graphs. We show that GDC is closely related to spectral-based models and thus combines the strengths of both spatial (message passing) and spectral methods. We demonstrate that replacing message passing with graph diffusion convolution consistently leads to significant performance improvements across a wide range of models on both supervised and unsupervised tasks and a variety of datasets. Furthermore, GDC is not limited to GNNs but can trivially be combined with any graph-based model or algorithm (e.g. spectral clustering) without requiring any changes to the latter or affecting its computational complexity. Our implementation is available online.

MCML Authors

Stephan Günnemann

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Data Analytics & Machine Learning

[54]

E. Faerman, O. Voggenreiter, F. Borutta, T. Emrich, M. Berrendorf and M. Schubert.
Graph Alignment Networks with Node Matching Scores.
NeurIPS 2019 - Workshop on Graph Representation Learning at the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 08-14, 2019. PDF

Abstract

In this work we address the problem of graph node alignment at the example of Map Fusion (MF). Given two partly overlapping road networks, the goal is to match nodes that represent the same locations in both networks. For this task we propose a new model based on Graph Neural Networks (GNN). Existing GNN approaches, which have recently been successfully applied on various tasks for graph based data, show poor performance for the MF task. We hypothesize that this is mainly caused by graph regions from the non-overlapping areas, as information from those areas negatively affect the learned node representations. Therefore, our model has an additional inductive bias and learns to ignore effects of nodes that do not have a matching in the other graph. Our new model can easily be extended to other graph alignment problems, e.g., for calculating graph similarities, or for the alignment of entities in knowledge graphs, as well.

MCML Authors

Evgeny Faerman

Dr.

* Former Member

Felix Borutta

Dr.

* Former Member

Max Berrendorf

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Spatial Artificial Intelligence

[53]

M. Lang, M. Binder, J. Richter, P. Schratz, F. Pfisterer, S. Coors, Q. Au, G. Casalicchio, L. Kotthoff and B. Bischl.
mlr3: A modern object-oriented machine learning framework in R.
The Journal of Open Source Software 4.44 (Dec. 2019). DOI

Abstract

The R (R Core Team, 2019) package mlr3 and its associated ecosystem of extension packages implements a powerful, object-oriented and extensible framework for machine learning (ML) in R. It provides a unified interface to many learning algorithms available on CRAN, augmenting them with model-agnostic general-purpose functionality that is needed in every ML project, for example train-test-evaluation, resampling, preprocessing, hyperparameter tuning, nested resampling, and visualization of results from ML experiments. The package is a complete reimplementation of the mlr (Bischl et al., 2016) package that leverages many years of experience and learned best practices to provide a state-of-the-art system that is powerful, flexible, extensible, and maintainable. We target both practitioners who want to quickly apply ML algorithms to their problems and researchers who want to implement, benchmark, and compare their new methods in a structured environment. mlr3 is suitable for short scripts that test an idea, for complex multi-stage experiments with advanced functionality that use a broad range of ML functionality, as a foundation to implement new ML (meta-)algorithms (for example AutoML systems), and everything in between. Functional correctness is ensured through extensive unit and integration tests.
Several other general-purpose ML toolboxes exist for different programing languages. The most widely used ones are scikit-learn (Pedregosa et al., 2011) for Python , Weka (Hall et al., 2009) for Java, and mlj (Blaom, Kiraly, Lienart, & Vollmer, 2019) for Julia. The most important toolboxes for R are mlr, caret (Kuhn, 2008) and tidymodels (Kuhn & Wickham, 2019).

MCML Authors

Michel Lang

Dr.

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Patrick Schratz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Stefan Coors

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[52]

M. Binder, J. Moosbauer, J. Thomas and B. Bischl.
Multi-Objective Hyperparameter Tuning and Feature Selection using Filter Ensembles.
Preprint (Dec. 2019). arXiv

Abstract

MCML Authors

Martin Binder

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Statistical Learning and Data Science

[51]

D. Mautz, C. Plant and C. Böhm.
Deep Embedded Cluster Tree.
ICDM 2019 - 19th IEEE International Conference on Data Mining. Beijing, China, Nov 08-11, 2019. DOI

Abstract

The idea of combining the high representational power of deep learning techniques with clustering methods has gained much interest in recent years. Optimizing representation and clustering simultaneously has been shown to have an advantage over optimizing them separately. However, so far all proposed methods have been using a flat clustering strategy, with the true number of clusters known a priori. In this paper, we propose the Deep Embedded Cluster Tree (DeepECT), the first divisive hierarchical embedded clustering method. The cluster tree does not need to know the true number of clusters during optimization. Instead, the level of detail to be analyzed can be chosen afterward and for each sub-tree separately. An optional data-augmentation-based extension allows DeepECT to ignore prior-known invariances of the dataset, such as affine transformations in image data. We evaluate and show the advantages of DeepECT in extensive experiments.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[50]

E. Faerman, M. Rogalla, N. Strauß, A. Krüger, B. Blümel, M. Berrendorf, M. Fromm and M. Schubert.
Spatial Interpolation with Message Passing Framework.
ICDMW 2019 - IEEE International Conference on Data Mining Workshops. Beijing, China, Nov 08-11, 2019. DOI

Abstract

Spatial interpolation is the task to predict a measurement for any location in a given geographical region. To train a prediction model, we assume to have point-wise measurements for various locations in the region. In addition, it is often beneficial to consider historic measurements for these locations when training an interpolation model. Typical use cases are the interpolation of weather, pollution or traffic information. In this paper, we introduce a new type of model with strong relational inductive bias based on Message Passing Networks. In addition, we extend our new model to take geomorphological characteristics into account to improve the prediciton quality. We provide an extensive evaluation based on a large real-world weather dataset and compare our new approach with classical statistical interpolation techniques and Neural Networks without inductive bias.

MCML Authors

Evgeny Faerman

Dr.

* Former Member

Niklas Strauß

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Max Berrendorf

Dr.

* Former Member

Michael Fromm

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[49]

M. Fromm, M. Berrendorf, E. Faerman, Y. Chen, B. Schüss and M. Schubert.
XD-STOD: Cross-Domain Superresolution for Tiny Object Detection.
ICDMW 2019 - IEEE International Conference on Data Mining Workshops. Beijing, China, Nov 08-11, 2019. DOI

Abstract

Monitoring the restoration of natural habitats after human intervention is an important task in the field of remote sensing. Currently, this requires extensive field studies entailing considerable costs. Unmanned Aerial vehicles (UAVs, a.k.a. drones) have the potential to reduce these costs, but generate immense amounts of data which have to be evaluated automatically with special techniques. Especially the automated detection of tree seedlings poses a big challenge, as their size and shape vary greatly across images. In addition, there is a tradeoff between different flying altitudes. Given the same camera equipment, a lower flying altitude achieves higher resolution images and thus, achieving high detection rates is easier. However, the imagery will only cover a limited area. On the other hand, flying at larger altitudes, allows for covering larger areas, but makes seedling detection more challenging due to the coarser images. In this paper we investigate the usability of super resolution (SR) networks for the case that we can collect a large amount of coarse imagery on higher flying altitudes, but only a small amount of high resolution images from lower flying altitudes. We use a collection of high-resolution images taken by a drone at 5m altitude. After training the SR models on these data, we evaluate their applicability to low quality images taken at 30m altitude (in-domain). In addition, we investigate and compare whether approaches trained on a highly diverse large data sets can be transferred to these data (cross-domain). We also evaluate the usability of the SR results based on their influence on the detection rate of different object detectors. We found that the features acquired from training on standard SR data sets are transferable to the drone footage. Furthermore, we demonstrate that the detection rate of common object detectors can be improved by SR techniques using both settings, in-domain and cross-domain.

MCML Authors

Michael Fromm

Dr.

* Former Member

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[48]

F. Lüer, D. Mautz and C. Böhm.
Anomaly Detection in Time Series using Generative Adversarial Networks.
ICDMW 2019 - IEEE International Conference on Data Mining Workshops. Beijing, China, Nov 08-11, 2019. DOI

Abstract

Generative Adversarial Networks (GANs) have been applied to an increasing amount of tasks, especially related to image data. A comparably recent advance was their application to the domain of anomaly detection in images and, even more recently, on spatiotemporal data. In this work, a recurrent GAN (RGAN) is applied on cardiovascular data from the MIT-BIH dataset to learn the natural variety of normal sinus rhythms in a healthy individual. The generator is used to reconstruct samples using differently parameterized levels of similarity and thresholds. We find that solely using the generator already allows a surprisingly good anomaly detection performance. Furthermore, we discuss adding the discriminator, which might significantly improve the performance. Future work also includes only using the discriminator, minimizing the time required for inference, which is important for streaming data.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Member

[47]

F. Borutta, S. Schmoll and S. Friedl.
Optimizing the Spatio-Temporal Resource Search Problem with Reinforcement Learning.
ACM SIGSPATIAL 2019 - 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. Chicago, ILL, USA, Nov 05-08, 2019. DOI

Abstract

Collecting spatio-temporal resources is an important goal in many real-world use cases such as finding customers for taxicabs. In this paper, we tackle the resource search problem posed by the GIS Cup 2019 where the objective is to minimize the average search time of taxicabs looking for customers. The main challenge is that the taxicabs may not communicate with each other and the only observation they have is the current time and position. Inspired by radial transit route structures in urban environments, our approach relies on round trips that are used as action space for a downstream reinforcement learning procedure. Our source code is publicly available at https://github.com/Fe18/TripBanditAgent.

MCML Authors

Felix Borutta

Dr.

* Former Member

Sabrina Friedl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[46]

F. Pfisterer, L. Beggel, X. Sun, F. Scheipl and B. Bischl.
Benchmarking time series classification -- Functional data vs machine learning approaches.
Preprint (Nov. 2019). arXiv

Abstract

Time series classification problems have drawn increasing attention in the machine learning and statistical community. Closely related is the field of functional data analysis (FDA): it refers to the range of problems that deal with the analysis of data that is continuously indexed over some domain. While often employing different methods, both fields strive to answer similar questions, a common example being classification or regression problems with functional covariates. We study methods from functional data analysis, such as functional generalized additive models, as well as functionality to concatenate (functional-) feature extraction or basis representations with traditional machine learning algorithms like support vector machines or classification trees. In order to assess the methods and implementations, we run a benchmark on a wide variety of representative (time series) data sets, with in-depth analysis of empirical results, and strive to provide a reference ranking for which method(s) to use for non-expert practitioners. Additionally, we provide a software framework in R for functional data analysis for supervised learning, including machine learning and more linear approaches from statistics. This allows convenient access, and in connection with the machine-learning toolbox mlr, those methods can now also be tuned and benchmarked.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Fabian Scheipl

PD Dr.

Functional Data Analysis

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[45]

F. Pfisterer, J. Thomas and B. Bischl.
Towards Human Centered AutoML.
Preprint (Nov. 2019). arXiv

Abstract

Building models from data is an integral part of the majority of data science workflows. While data scientists are often forced to spend the majority of the time available for a given project on data cleaning and exploratory analysis, the time available to practitioners to build actual models from data is often rather short due to time constraints for a given project. AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. In this position paper, we aim to discuss the impact of the rising popularity of such systems and how a user-centered interface for such systems could look like. More importantly, we also want to point out features that are currently missing in those systems and start to explore better usability of such systems from a data-scientists perspective.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[44]

G. König and M. Grosse-Wentrup.
A Causal Perspective on Challenges for AI in Precision Medicine.
PMBC 2019 - 2nd International Congress on Precision Medicine. Munich, Germany, Oct 14-15, 2019.

MCML Authors

Gunnar König

Dr.

* Former Member

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

[43]

F. Borutta, J. Busch, E. Faerman, A. Klink and M. Schubert.
Structural Graph Representations based on Multiscale Local Network Topologies.
WI 2019 - IEEE/WIC/ACM International Conference on Web Intelligence. Thessaloniki, Greece, Oct 14-17, 2019. DOI

Abstract

In many applications, it is required to analyze a graph merely based on its topology. In these cases, nodes can only be distinguished based on their structural neighborhoods and it is common that nodes having the same functionality or role yield similar neighborhood structures. In this work, we investigate two problems: (1) how to create structural node embeddings which describe a node’s role and (2) how important the nodes’ roles are for characterizing entire graphs. To describe the role of a node, we explore the structure within the local neighborhood (or multiple local neighborhoods of various extents) of the node in the vertex domain, compute the visiting probability distribution of nodes in the local neighborhoods and summarize each distribution to a single number by computing its entropy. Furthermore, we argue that the roles of nodes are important to characterize the entire graph. Therefore, we propose to aggregate the role representations to describe whole graphs for graph classification tasks. Our experiments show that our new role descriptors outperform state-of-the-art structural node representations that are usually more expensive to compute. Additionally, we achieve promising results compared to advanced state-of-the-art approaches for graph classification on various benchmark datasets, often outperforming these approaches.

MCML Authors

Felix Borutta

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[42]

M. Fromm, E. Faerman and T. Seidl.
TACAM: Topic And Context Aware Argument Mining.
WI 2019 - IEEE/WIC/ACM International Conference on Web Intelligence. Thessaloniki, Greece, Oct 14-17, 2019. DOI

Abstract

In this work we address the problem of argument search. The purpose of argument search is the distillation of pro and contra arguments for requested topics from large text corpora. In previous works, the usual approach is to use a standard search engine to extract text parts which are relevant to the given topic and subsequently use an argument recognition algorithm to select arguments from them. The main challenge in the argument recognition task, which is also known as argument mining, is that often sentences containing arguments are structurally similar to purely informative sentences without any stance about the topic. In fact, they only differ semantically. Most approaches use topic or search term information only for the first search step and therefore assume that arguments can be classified independently of a topic. We argue that topic information is crucial for argument mining, since the topic defines the semantic context of an argument. Precisely, we propose different models for the classification of arguments, which take information about a topic of an argument into account. Moreover, to enrich the context of a topic and to let models understand the context of the potential argument better, we integrate information from different external sources such as Knowledge Graphs or pre-trained NLP models. Our evaluation shows that considering topic information, especially in connection with external information, provides a significant performance boost for the argument mining task.

MCML Authors

Michael Fromm

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[41]

A. Beer, J. Lauterbach and T. Seidl.
MORe++: k-Means Based Outlier Removal on High-Dimensional Data.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

MORe++ is a k-Means based Outlier Removal method working on high dimensional data. It is simple, efficient and scalable. The core idea is to find local outliers by examining the points of different k-Means clusters separately. Like that, one-dimensional projections of the data become meaningful and allow to find one-dimensional outliers easily, which else would be hidden by points of other clusters. MORe++ does not need any additional input parameters than the number of clusters k used for k-Means, and delivers an intuitively accessible degree of outlierness. In extensive experiments it performed well compared to k-Means– and ORC.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[40]

M. Berrendorf, F. Borutta and P. Kröger.
k-Distance Approximation for Memory-Efficient RkNN Retrieval.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

For a given query object, Reverse k-Nearest Neighbor queries retrieve those objects that have the query object among their k-nearest neighbors. However, computing the k-nearest neighbor sets for all points in a database is expensive in terms of computational costs. Therefore, specific index structures have been invented to apply pruning heuristics which aim at reducing the search space. At time, the state-of-the-art index structure for enabling fast RkNN query processing in general metric spaces is the MRkNNCoP-Tree which uses linear functions to approximate lower and upper bounds on the k-distances to prune the search space. Storing those linear functions results in additional storage costs in O(n) which might be infeasible in situation where storage space is limited, e.g., on mobile devices. In this work, we present a novel index based on the MRkNNCoP-Tree as well as recent developments in the field of neural indexing. By learning a single neural network model that approximates the k-nearest neighbor distance bounds for all points in a database, the storage complexity of the proposed index structure is reduced to O(1) while the index is still able to guarantee exact query results. As shown in our experimental evaluations on synthetic and real-world data sets, our approach can significantly reduce the required storage space in trade-off to some growth in terms of refinement sets when relying on exact query processing.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Felix Borutta

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[39]

F. Borutta, P. Kröger and T. Hubauer.
A Generic Summary Structure for Arbitrarily Oriented Subspace Clustering in Data Streams.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

Nowadays, as lots of data is gathered in large volumes and with high velocity, the development of algorithms capable of handling complex data streams in (near) real-time is a major challenge. In this work, we present the algorithm CORRSTREAM which tackles the problem of detecting arbitrarily oriented subspace clusters in high-dimensional data streams. The proposed method follows a two phase approach, where the continuous online phase aggregates data points within a proper microcluster structure that stores all necessary information to define a microcluster’s subspace and is generic enough to cope with a variety of offline procedures. Given several such microclusters, the offline phase is able to build a final clustering model which reveals arbitrarily oriented subspaces in which the data tend to cluster. In our experimental evaluation, we show that CORRSTREAM not only has an acceptable throughput but also outperforms static counterpart algorithms by orders of magnitude when considering the runtime. At the same time, the loss of accuracy is quite small.

MCML Authors

Felix Borutta

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[38]

M. A. X. Hünemörder, D. Kazempour, P. Kröger and T. Seidl.
SIDEKICK: Linear Correlation Clustering with Supervised Background Knowledge.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

While explainable AI (XAI) is gaining in popularity, other more traditional machine learning algorithms can also benefit from increased explainability. A semi-supervised approach to correlation clustering opens up a promising design space that might provide such explainability to correlation clustering algorithms. In this work, semi-supervised linear correlation clustering is defined as the task of finding arbitrary oriented subspace clusters using only a small sample of supervised background knowledge provided by a domain experts. This work describes a first foray into this novel approach and provides an implementation of a basic algorithm to perform this task. We have found that even a small amount of supervised background knowledge can significantly improve the quality of correlation clustering in general. With confidence it can be stated, the results of this work have the potential to inspire several more semi-supervised approaches to correlation clustering in the future.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[37]

D. Kazempour, M. Hünemörder and T. Seidl.
On coMADs and Principal Component Analysis.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

Principal Component Analysis (PCA) is a popular method for linear dimensionality reduction. It is often used to discover hidden correlations or to facilitate the interpretation and visualization of data. However, it is liable to suffer from outliers. Strong outliers can skew the principal components and as a consequence lead to a higher reconstruction loss. While there exist several sophisticated approaches to make the PCA more robust, we present an approach which is intriguingly simple: we replace the covariance matrix by a so-called coMAD matrix. The first experiments show that PCA based on the coMAD matrix is more robust towards outliers.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[36]

L. Della Libera, V. Golkov, Y. Zhu, A. Mielke and D. Cremers.
Deep Learning for 2D and 3D Rotatable Data: An Overview of Methods.
Preprint (Oct. 2019). arXiv

Abstract

Convolutional networks are successful due to their equivariance/invariance under translations. However, rotatable data such as images, volumes, shapes, or point clouds require processing with equivariance/invariance under rotations in cases where the rotational orientation of the coordinate system does not affect the meaning of the data (e.g. object classification). On the other hand, estimation/processing of rotations is necessary in cases where rotations are important (e.g. motion estimation). There has been recent progress in methods and theory in all these regards. Here we provide an overview of existing methods, both for 2D and 3D rotations (and translations), and identify commonalities and links between them.

MCML Authors

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[35]

A. Beer, N. S. Schüler and T. Seidl.
A Generator for Subspace Clusters.
LWDA 2019 - Conference on Lernen. Wissen. Daten. Analysen. Berlin, Germany, Sep 30-Oct 02, 2019. PDF

Abstract

We introduce a generator for data containing subspace clusters which is accurately tunable and adjustable to the needs of developers. It is online available and allows to give a plethora of characteristics the data should contain, while it is simultaneously able to generate meaningful data containing subspace clusters with a minimum of input data.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[34]

D. Kazempour, A. Beer, O. Schrüfer and T. Seidl.
Clustering Trend Data Time-Series through Segmentation of FFT-decomposed Signal Constituents.
LWDA 2019 - Conference on Lernen. Wissen. Daten. Analysen. Berlin, Germany, Sep 30-Oct 02, 2019. PDF

Abstract

When we are given trend data for different keywords, scientists may want to cluster them in order to detect specific terms which exhibit a similar trending. For this purpose the periodic regression on each of the time-series can be performed. We ask in this work: What if we not simply cluster the regression models of each time-series, but the periodic signal constituents? The impact of such an approach is twofold: first we would see at a regression level how similar or dissimilar two time-series are regarding their periodic models, and secondly we would be able to see similarities based on single signal constituents between different time-series, containing the semantic that although time-series may be different on a regression level, they may be similar on an constituent level, reflecting other periodic influences. The results of this approach reveal commonalities between time series on a constituent level that are not visible in first place, by looking at their plain regression models.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[33]

D. Kazempour, L. M. Yan and T. Seidl.
From Covariance to Comode in context of Principal Component Analysis.
LWDA 2019 - Conference on Lernen. Wissen. Daten. Analysen. Berlin, Germany, Sep 30-Oct 02, 2019. PDF

Abstract

When it comes to the task of dimensionality reduction, the Principal Component Analysis (PCA) is among the most well known methods. Despite its popularity, PCA is prone to outliers which can be traced back to the fact that this method relies on a covariance matrix. Even with the variety of sophisticated methods to enhance the robustness of the PCA, we provide here in this work-in-progress an approach which is intriguingly simple: the covariance matrix is replaced by a so-called comode matrix. Through this minor modification the experiments show that the reconstruction loss is significantly reduced. In this work we introduce the comode and its relation to the MeanShift algorithm, including its bandwidth parameter, compare it in an experiment against the classic covariance matrix and evaluate the impact of the bandwidth hyperparameter on the reconstruction error.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[32]

L. Beggel, M. Pfeiffer and B. Bischl.
Robust Anomaly Detection in Images Using Adversarial Autoencoders.
ECML-PKDD 2019 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Wuerzburg, Germany, Sep 16-20, 2019. DOI

Abstract

Reliably detecting anomalies in a given set of images is a task of high practical relevance for visual quality inspection, surveillance, or medical image analysis. Autoencoder neural networks learn to reconstruct normal images, and hence can classify those images as anomalies, where the reconstruction error exceeds some threshold. Here we analyze a fundamental problem of this approach when the training set is contaminated with a small fraction of outliers. We find that continued training of autoencoders inevitably reduces the reconstruction error of outliers, and hence degrades the anomaly detection performance. In order to counteract this effect, an adversarial autoencoder architecture is adapted, which imposes a prior distribution on the latent representation, typically placing anomalies into low likelihood-regions. Utilizing the likelihood model, potential anomalies can be identified and rejected already during training, which results in an anomaly detector that is significantly more robust to the presence of outliers during training.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[31]

J. Goschenhofer, F. M. J. Pfister, K. A. Yuksel, B. Bischl, U. Fietzek and J. Thomas.
Wearable-based Parkinson's Disease Severity Monitoring using Deep Learning.
ECML-PKDD 2019 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Wuerzburg, Germany, Sep 16-20, 2019. DOI

Abstract

One major challenge in the medication of Parkinson’s disease is that the severity of the disease, reflected in the patients’ motor state, cannot be measured using accessible biomarkers. Therefore, we develop and examine a variety of statistical models to detect the motor state of such patients based on sensor data from a wearable device. We find that deep learning models consistently outperform a classical machine learning model applied on hand-crafted features in this time series classification task. Furthermore, our results suggest that treating this problem as a regression instead of an ordinal regression or a classification task is most appropriate. For consistent model evaluation and training, we adopt the leave-one-subject-out validation scheme to the training of deep learning models. We also employ a class-weighting scheme to successfully mitigate the problem of high multi-class imbalances in this domain. In addition, we propose a customized performance measure that reflects the requirements of the involved medical staff on the model. To solve the problem of limited availability of high quality training data, we propose a transfer learning technique which helps to improve model performance substantially. Our results suggest that deep learning techniques offer a high potential to autonomously detect motor states of patients with Parkinson’s disease.

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Janek Thomas

Dr.

* Former Member

[30]

C. Molnar, G. Casalicchio and B. Bischl.
Quantifying Model Complexity via Functional Decomposition for Better Post-hoc Interpretability.
ECML-PKDD 2019 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Wuerzburg, Germany, Sep 16-20, 2019. DOI

Abstract

Post-hoc model-agnostic interpretation methods such as partial dependence plots can be employed to interpret complex machine learning models. While these interpretation methods can be applied regardless of model complexity, they can produce misleading and verbose results if the model is too complex, especially w.r.t. feature interactions. To quantify the complexity of arbitrary machine learning models, we propose model-agnostic complexity measures based on functional decomposition: number of features used, interaction strength and main effect complexity. We show that post-hoc interpretation of models that minimize the three measures is more reliable and compact. Furthermore, we demonstrate the application of these measures in a multi-objective optimization approach which simultaneously minimizes loss and complexity.

MCML Authors

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[29]

C. A. Scholbeck, C. Molnar, C. Heumann, B. Bischl and G. Casalicchio.
Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model Agnostic Interpretations.
ECML-PKDD 2019 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Wuerzburg, Germany, Sep 16-20, 2019. DOI

Abstract

Model-agnostic interpretation techniques allow us to explain the behavior of any predictive model. Due to different notations and terminology, it is difficult to see how they are related. A unified view on these methods has been missing. We present the generalized SIPA (sampling, intervention, prediction, aggregation) framework of work stages for model-agnostic interpretations and demonstrate how several prominent methods for feature effects can be embedded into the proposed framework. Furthermore, we extend the framework to feature importance computations by pointing out how variance-based and performance-based importance measures are based on the same work stages. The SIPA framework reduces the diverse set of model-agnostic techniques to a single methodology and establishes a common terminology to discuss them in future work.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[28]

F. Pfisterer, S. Coors, J. Thomas and B. Bischl.
Multi-Objective Automatic Machine Learning with AutoxgboostMC.
ECML-PKDD 2019 - Workshops at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Wuerzburg, Germany, Sep 16-20, 2019. arXiv

Abstract

AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. They often combine techniques from many different sub-fields of machine learning in order to find a model or set of models that optimize a user-supplied criterion, such as predictive performance. The ultimate goal of such systems is to reduce the amount of time spent on menial tasks, or tasks that can be solved better by algorithms while leaving decisions that require human intelligence to the end-user. In recent years, the importance of other criteria, such as fairness and interpretability, and many others have become more and more apparent. Current AutoML frameworks either do not allow to optimize such secondary criteria or only do so by limiting the system’s choice of models and preprocessing steps. We propose to optimize additional criteria defined by the user directly to guide the search towards an optimal machine learning pipeline. In order to demonstrate the need and usefulness of our approach, we provide a simple multi-criteria AutoML system and showcase an exemplary application.

MCML Authors

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Stefan Coors

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[27]

J. Held, A. Beer and T. Seidl.
Chain-detection Between Clusters.
Datenbank-Spektrum 19 (Sep. 2019). DOI

Abstract

Chains connecting two or more different clusters are a well known problem of clustering algorithms like DBSCAN or Single Linkage Clustering. Since already a small number of points resulting from, e.g., noise can form such a chain and build a bridge between different clusters, it can happen that the results of the clustering algorithm are distorted: several disparate clusters get merged into one. This single-link effect is rather known but to the best of our knowledge there are no satisfying solutions which extract those chains, yet. We present a new algorithm detecting not only straight chains between clusters, but also bent and noisy ones. Users are able to choose between eliminating one dimensional and higher dimensional chains connecting clusters to receive the underlying cluster structure. Also, the desired straightness can be set by the user. As this paper is an extension of ‘Chain-detection for DBSCAN’, we apply our technique not only in combination with DBSCAN but also with single link hierarchical clustering. On a real world dataset containing traffic accidents in Great Britain we were able to detect chains emerging from streets between cities and villages, which led to clusters composed of diverse villages. Additionally, we analyzed the robustness regarding the variance of chains in synthetic experiments.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[26]

S. Schmoll, S. Friedl and M. Schubert.
Scaling the Dynamic Resource Routing Problem.
SSTD 2019 - 16th International Symposium on Spatial and Temporal Databases. Vienna, Austria, Aug 19-21, 2019. DOI

Abstract

Routing to a resource (e.g. a parking spot or charging station) is a probabilistic search problem due to the uncertainty as to whether the resource is available at the time of arrival or not. In recent years, more and more real-time information about the current state of resources has become available in order to facilate this task. Therefore, we consider the case of a driver receiving online updates about the current situation. In this setting, the problem can be described as a fully observable Markov Decision Process (MDP) which can be used to compute an optimal policy minimizing the expected search time. However, current approaches do not scale beyond a dozen resources in a query. In this paper, we suggest to adapt common approximate solutions for solving MDPs. We propose a new re-planning and hindsight planning algorithm that redefine the state space and rely on novel cost estimations to find close to optimal results. Unlike exact solutions for computing MDPs, our approximate planers can scale up to hundreds of resources without prohibitive computational costs. We demonstrate the result quality and the scalability of our approaches on two settings describing the search for parking spots and charging stations in an urban environment.

MCML Authors

Sabrina Friedl

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[25]

A. Beer, D. Kazempour, M. Baur and T. Seidl.
Human Learning in Data Science (Poster Extended Abstract).
HCII 2019 - 21st International Conference of Human-Computer Interaction. Orlando, Florida, USA, Jul 26-31, 2019. DOI

Abstract

As machine learning becomes a more and more important area in Data Science, bringing with it a rise of abstractness and complexity, the desire for explainability rises, too. With our work we aim to gain explainability focussing on correlation clustering and try to pursue the original goals of different Data Science tasks,: Extracting knowledge from data. As well-known tools like Fold-It or GeoTime show, gamification is a very mighty approach, but not only to solve tasks which prove more difficult for machines than for humans. We could also gain knowledge from how players proceed trying to solve those difficult tasks. That is why we developed Straighten it up!, a game in which users try to find the best linear correlations in high dimensional datasets. Finding arbitrarily oriented subspaces in high dimensional data is an exponentially complex task due to the number of potential subspaces in regards to the number of dimensions. Nevertheless, linearly correlated points are as a simple pattern easy to track by the human eye. Straighten it up! gives users an overview over two-dimensional projections of a self-chosen dataset. Users decide which subspace they want to examine first, and can draw in arbitrarily many lines fitting the data. An offset inside of which points are assigned to the corresponding line can easily be chosen for every line independently, and users can switch between different projections at any time. We developed a scoring system not only as incentive, but first of all for further examination, based on the density of each cluster, its minimum spanning tree, size of offset, and coverage. By tracking every step of a user we are able to detect common mechanisms and examine differences to state-of-the-art correlation and subspace clustering algorithms, resulting in more comprehensibility.

MCML Authors

Anna Beer

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[24]

D. Kazempour, A. Beer and T. Seidl.
Data on RAILs: On interactive generation of artificial linear correlated data (Poster Extended Abstract).
HCII 2019 - 21st International Conference of Human-Computer Interaction. Orlando, Florida, USA, Jul 26-31, 2019. DOI

Abstract

Artificially generated data sets are present in many data mining and machine learning publications in the experimental section. One of the reasons to use synthetic data is, that scientists can express their understanding of a “ground truth”, having labels and thus an expectation of what an algorithm should be able to detect. This permits also a degree of control to create data sets which either emphasize the strengths of a method or reveal its weaknesses and thus potential targets for improvement. In order to develop methods which detect linear correlated clusters, the necessity of generating such artificial clusters is indispensable. This is mostly done by command-line based scripts which may be tedious since they demand from users to ‘visualize’ in their minds how the correlated clusters have to look like and be positioned within the data space. We present in this work RAIL, a generator for Reproducible Artificial Interactive Linear correlated data. With RAIL, users can add multiple planes into a data space and arbitrarily change orientation and position of those planes in an interactive fashion. This is achieved by manipulating the parameters describing each of the planes, giving users immediate feedback in real-time. With this approach scientists no longer need to imagine their data but can interactively explore and design their own artificial data sets containing linear correlated clusters. Another convenient feature in this context is that the data is only generated when the users decide that their design phase is completed. If researchers want to share data, a small file is exchanged containing the parameters which describe the clusters through information such as e.g. their Hessian-Normal-Form or number of points per cluster, instead of sharing several large csv files.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[23]

A. Beer, D. Kazempour, L. Stephan and T. Seidl.
LUCK - Linear Correlation Clustering Using Cluster Algorithms and a kNN based Distance Function (short paper).
SSDBM 2019 - 31st International Conference on Scientific and Statistical Database Management. Santa Cruz, CA, USA, Jul 23-25, 2019. DOI

Abstract

LUCK allows to use any distance-based clustering algorithm to find linear correlated data. For that a novel distance function is introduced, which takes the distribution of the kNN of points into account and corresponds to the probability of two points being part of the same linear correlation. In this work in progress we tested the distance measure with DBSCAN and k-Means comparing it to the well-known linear correlation clustering algorithms ORCLUS, 4C, COPAC, LMCLUS, and CASH, receiving good results for difficult synthetic data sets containing crossing or non-continuous correlations.

MCML Authors

Anna Beer

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[22]

A. Beer and T. Seidl.
Graph Ordering and Clustering - A Circular Approach.
SSDBM 2019 - 31st International Conference on Scientific and Statistical Database Management. Santa Cruz, CA, USA, Jul 23-25, 2019. DOI

Abstract

As the ordering of data, particularly of graphs, can influence the result of diverse Data Mining tasks performed on it heavily, we introduce the Circle-Index, the first internal quality measurement for orderings of graphs. It is based on a circular arrangement of nodes, but takes in contrast to similar arrangements from the field of, e.g., visual analytics, the edge lengths in this arrangement into account. The minimization of the Circle-Index leads to an arrangement which not only offers a simple way to cluster the data using a constrained texttt{MinCut} in only linear time, but is also visually convincing. We developed the clustering algorithm CirClu which implements this minimization and texttt{MinCut}, and compared it with several established clustering algorithms achieving very good results. Simultaneously we compared the Circle-Index with several internal quality measures for clusterings. We observed a strong coherence between the Circle-Index and the matching of achieved clusterings to the respective ground truths in diverse real world datasets.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[21]

D. Kazempour, K. Emmerig, P. Kröger and T. Seidl.
Detecting Global Periodic Correlated Clusters in Event Series based on Parameter Space Transform.
SSDBM 2019 - 31st International Conference on Scientific and Statistical Database Management. Santa Cruz, CA, USA, Jul 23-25, 2019. DOI

Abstract

Periodicities are omnipresent: In nature in the cycles of predator and prey populations, reoccurring patterns regarding our power consumption over the days, or the presence of flu diseases over the year. With regards to the importance of periodicities we ask: Is there a way to detect periodic correlated clusters which are hidden in event series? We propose as a work in progress a method for detecting sinusoidal periodic correlated clusters on event series which relies on parameter space transformation. Our contributions are: Providing the first non-linear correlation clustering algorithm for detecting periodic correlated clusters. Further our method provides an explicit model giving domain experts information on parameters such as amplitude, frequency, phase-shift and vertical-shift of the detected clusters. Beyond that we approach the issue of determining an adequate frequency and phase-shift of the detected correlations given a frequency and phase-shift boundary.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[20]

D. Kazempour and T. Seidl.
On systematic hyperparameter analysis through the example of subspace clustering.
SSDBM 2019 - 31st International Conference on Scientific and Statistical Database Management. Santa Cruz, CA, USA, Jul 23-25, 2019. DOI

Abstract

In publications where a clustering method is described, the chosen hyperparameters are in many cases to our current observation empirically determined. In this work in progress we discuss and propose one approach on how hyperparameters can be systematically explored and their effects regarding the data set analyzed. We further introduce in the context of hyperparameter analysis a modified definition of the resilience term, which refers here to a subset of data points which persists to be in the same cluster over different hyperparameter settings. In order to analyze relations among different hyperparameters we further introduce the concept of dynamic intersection computing.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[19]

M. Lotfollahi, F. A. Wolf and F. J. Theis.
scGen predicts single-cell perturbation responses.
Nature Methods 16.8 (Jul. 2019). DOI GitHub

Abstract

Accurately modeling cellular response to perturbations is a central goal of computational biology. While such modeling has been based on statistical, mechanistic and machine learning models in specific settings, no generalization of predictions to phenomena absent from training data (out-of-sample) has yet been demonstrated. Here, we present scGen (https://github.com/theislab/scgen), a model combining variational autoencoders and latent space vector arithmetics for high-dimensional single-cell gene expression data. We show that scGen accurately models perturbation and infection response of cells across cell types, studies and species. In particular, we demonstrate that scGen learns cell-type and species-specific responses implying that it captures features that distinguish responding from non-responding genes and cells. With the upcoming availability of large-scale atlases of organs in a healthy state, we envision scGen to become a tool for experimental design through in silico screening of perturbation response in the context of disease and drug treatment.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[18]

F. Erhard, M. A. P. Baptista, T. Krammer, T. Hennig, M. Lange, P. Arampatzi, C. S. Jürges, F. J. Theis, A.-E. Saliba and L. Dölken.
scSLAM-seq reveals core features of transcription dynamics in single cells.
Nature 571 (Jul. 2019). DOI

Abstract

Single-cell RNA sequencing (scRNA-seq) has highlighted the important role of intercellular heterogeneity in phenotype variability in both health and disease1. However, current scRNA-seq approaches provide only a snapshot of gene expression and convey little information on the true temporal dynamics and stochastic nature of transcription. A further key limitation of scRNA-seq analysis is that the RNA profile of each individual cell can be analysed only once. Here we introduce single-cell, thiol-(SH)-linked alkylation of RNA for metabolic labelling sequencing (scSLAM-seq), which integrates metabolic RNA labelling2, biochemical nucleoside conversion3 and scRNA-seq to record transcriptional activity directly by differentiating between new and old RNA for thousands of genes per single cell. We use scSLAM-seq to study the onset of infection with lytic cytomegalovirus in single mouse fibroblasts. The cell-cycle state and dose of infection deduced from old RNA enable dose–response analysis based on new RNA. scSLAM-seq thereby both visualizes and explains differences in transcriptional activity at the single-cell level. Furthermore, it depicts ‘on–off’ switches and transcriptional burst kinetics in host gene expression with extensive gene-specific differences that correlate with promoter-intrinsic features (TBP–TATA-box interactions and DNA methylation). Thus, gene-specific, and not cell-specific, features explain the heterogeneity in transcriptomes between individual cells and the transcriptional response to perturbations.

MCML Authors

Marius Lange

Dr.

* Former Member

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[17]

A. Bojchevski and S. Günnemann.
Adversarial Attacks on Node Embeddings via Graph Poisoning.
ICML 2019 - 36th International Conference on Machine Learning. Long Beach, CA, USA, Jun 09-15, 2019. URL

Abstract

The goal of network representation learning is to learn low-dimensional node embeddings that capture the graph structure and are useful for solving downstream tasks. However, despite the proliferation of such methods, there is currently no study of their robustness to adversarial attacks. We provide the first adversarial vulnerability analysis on the widely used family of methods based on random walks. We derive efficient adversarial perturbations that poison the network structure and have a negative effect on both the quality of the embeddings and the downstream tasks. We further show that our attacks are transferable since they generalize to many models and are successful even when the attacker is restricted.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[16]

L. M. Weber, W. Saelens, R. Cannoodt, C. Soneson, A. Hapfelmeier, P. P. Gardner, A.-L. Boulesteix, Y. Saeys and M. D. Robinson.
Essential guidelines for computational method benchmarking.
Genome Biology 20.125 (Jun. 2019). DOI

Abstract

In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[15]

M. D. Luecken and F. J. Theis.
Current best practices in single‐cell RNA‐seq analysis: a tutorial.
Molecular Systems Biology 15.e8746 (Jun. 2019). DOI GitHub

Abstract

Single‐cell RNA‐seq has enabled gene expression to be studied at an unprecedented resolution. The promise of this technology is attracting a growing user base for single‐cell analysis methods. As more analysis tools are becoming available, it is becoming increasingly difficult to navigate this landscape and produce an up‐to‐date workflow to analyse one’s data. Here, we detail the steps of a typical single‐cell RNA‐seq analysis, including pre‐processing (quality control, normalization, data correction, feature selection, and dimensionality reduction) and cell‐ and gene‐level downstream analysis. We formulate current best‐practice recommendations for these steps based on independent comparison studies. We have integrated these best‐practice recommendations into a workflow, which we apply to a public dataset to further illustrate how these steps work in practice. Our documented case study can be found at https://www.github.com/theislab/single-cell-tutorial. This review will serve as a workflow tutorial for new entrants into the field, and help established users update their analysis pipelines.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Modelling of Biological Systems

[14]

J. Thomas.
Gradient boosting in automatic machine learning: feature selection and hyperparameter optimization.
Dissertation 2019. DOI

Abstract

This thesis focuses on automating model selection in AutoML, specifically through gradient boosting techniques like gradient tree and component-wise boosting. It addresses challenges in hyperparameter optimization using Bayesian methods, introduces a new feature selection technique, and proposes an AutoML approach that simplifies the process while maintaining accuracy. Four R packages were developed: mlrMBO for Bayesian optimization, autoxgboost for AutoML, compboost for component-wise boosting, and gamboostLSS for generalized additive models (Shortened.)

MCML Authors

Janek Thomas

Dr.

* Former Member

[13]

Q. Au, D. Schalk, G. Casalicchio, R. Schoedel, C. Stachl and B. Bischl.
Component-Wise Boosting of Targets for Multi-Output Prediction.
Preprint (Apr. 2019). arXiv

Abstract

Multi-output prediction deals with the prediction of several targets of possibly diverse types. One way to address this problem is the so called problem transformation method. This method is often used in multi-label learning, but can also be used for multi-output prediction due to its generality and simplicity. In this paper, we introduce an algorithm that uses the problem transformation method for multi-output prediction, while simultaneously learning the dependencies between target variables in a sparse and interpretable manner. In a first step, predictions are obtained for each target individually. Target dependencies are then learned via a component-wise boosting approach. We compare our new method with similar approaches in a benchmark using multi-label, multivariate regression and mixed-type datasets.

MCML Authors

Daniel Schalk

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[12]

A. Beer, D. Kazempour and T. Seidl.
Rock - Let the points roam to their clusters themselves.
EDBT 2019 - 22nd International Conference on Extending Database Technology. Lisbon, Portugal, Mar 26-29, 2019. PDF

Abstract

In this work we present Rock, a method where the points roam to their clusters using k-NN. Rock is a draft for an algorithm which is capable of detecting non-convex clusters of arbitrary dimension while delivering representatives for each cluster similar to, e.g., Mean Shift or k-Means. Applying Rock, points roam to the mean of their k-NN while k increments in every step. Like that, rather outlying points and noise move to their nearest cluster while the clusters themselves contract first to their skeletons and further to a representative point each. Our empirical results on synthetic and real data demonstrate that Rock is able to detect clusters on datasets where either mode seeking or density-based approaches do not succeed.

MCML Authors

Anna Beer

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[11]

D. Kazempour, L. Krombholz, P. Kröger and T. Seidl.
A Galaxy of Correlations - Detecting Linear Correlated Clusters through k-Tuples Sampling using Parameter Space Transform.
EDBT 2019 - 22nd International Conference on Extending Database Technology. Lisbon, Portugal, Mar 26-29, 2019. PDF

Abstract

In different research domains conducted experiments aim for the detection of (hyper)linear correlations among multiple features within a given data set. For this purpose methods exist where one among them is highly robust against noise and detects linear correlated clusters regardless of any locality assumption. This method is based on parameter space transformation. The currently available parameter transform based algorithms detect the clusters scanning explicitly for intersections of functions in parameter space. This approach comes with drawbacks. It is difficult to analyze aspects going beyond the sole intersection of functions, such as e.g. the area around the intersections and further it is computationally expensive. The work in progress method we provide here overcomes the mentioned drawbacks by sampling d-dimensional tuples in data space, generating a (hyper)plane and representing this plane as a single point in parameter space. By this approach we no longer scan for intersection points of functions in parameter space but for dense regions of such parameter vectors. By this approach in future work well established clustering algorithms can be applied in parameter space to detect e.g. dense regions, modes or hierarchies of linear correlations in parameter space.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[10]

D. Kazempour and T. Seidl.
Insights into a running clockwork: On interactive process-aware clustering.
EDBT 2019 - 22nd International Conference on Extending Database Technology. Lisbon, Portugal, Mar 26-29, 2019. PDF

Abstract

In recent years the demand for having algorithms which provide not only their results, but also add explainability up to a certain extent increased. In this paper we envision a class of clustering algorithms where the users can interact not only with the input or output but also intercept within the very clustering process itself, which we coin with the term process-aware clustering. Further we aspire to sketch the challenges emerging with such type of algorithms, such as the need of adequate measures which evaluate the progression through the computation process of a clustering method. Beyond the explainability on how the results are generated, we propose methods tailored at systematically analyzing the hyperparameter space of an algorithm, determining in a more ordered fashion suitable hyperparameters rather then applying a trial-and-error schema.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Database Systems and Data Mining

[9]

G. Casalicchio.
On benchmark experiments and visualization methods for the evaluation and interpretation of machine learning models.
Dissertation 2019. DOI

Abstract

This cumulative dissertation consists of five articles divided into three parts. The first part extends the mlr package in R to implement and benchmark multilabel classification methods. The second part focuses on simplifying benchmark experiments with OpenML.org, introducing the OpenML R package and the OpenML100 benchmarking suite for standardized dataset and result management. The third part addresses model evaluation and interpretability, proposing the residual-based predictiveness (RBP) curve to improve upon the predictiveness curve and introducing new visualization tools, including the Shapley feature importance (SFIMP) measure for model interpretation. (Shortened.)

MCML Authors

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[8]

D. Kazempour, M. Kazakov, P. Kröger and T. Seidl.
DICE: Density-based Interactive Clustering and Exploration.
BTW 2019 - 18th Symposium of Database Systems for Business, Technology and Web. Rostock, Germany, Mar 04-08, 2019. DOI

Abstract

Clustering algorithms are mostly following the pipeline to provide input data, and hyperparameter values. Then the algorithms are executed and the output files are generated or visualized. We provide in our work an early prototype of an interactive density-based clustering tool named DICE in which the users can change the hyperparameter settings and immediately observe the resulting clusters. Further the users can browse through each of the single detected clusters and get statistics regarding as well as a convex hull profile for each cluster. Further DICE keeps track of the chosen settings, enabling the user to review which hyperparameter values have been previously chosen. DICE can not only be used in scientific context of analyzing data, but also in didactic settings in which students can learn in an exploratory fashion how a density-based clustering algorithm like e.g. DBSCAN behaves.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[7]

P. Probst, A.-L. Boulesteix and B. Bischl.
Tunability: Importance of Hyperparameters of Machine Learning Algorithms.
Journal of Machine Learning Research 20 (Mar. 2019). PDF

Abstract

Modern supervised machine learning algorithms involve hyperparameters that have to be set before running them. Options for setting hyperparameters are default values from the software package, manual configuration by the user or configuring them for optimal predictive performance by a tuning procedure. The goal of this paper is two-fold. Firstly, we formalize the problem of tuning from a statistical point of view, define data-based defaults and suggest general measures quantifying the tunability of hyperparameters of algorithms. Secondly, we conduct a large-scale benchmarking study based on 38 datasets from the OpenML platform and six common machine learning algorithms. We apply our measures to assess the tunability of their parameters. Our results yield default values for hyperparameters and enable users to decide whether it is worth conducting a possibly time consuming tuning strategy, to focus on the most important hyperparameters and to choose adequate hyperparameter spaces for tuning.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[6]

C. Happ, F. Scheipl, A.-A. Gabriel and S. Greven.
A general framework for multivariate functional principal component analysis of amplitude and phase variation.
Stat 8.2 (Feb. 2019). DOI

Abstract

Functional data typically contain amplitude and phase variation. In many data situations, phase variation is treated as a nuisance effect and is removed during preprocessing, although it may contain valuable information. In this note, we focus on joint principal component analysis (PCA) of amplitude and phase variation. As the space of warping functions has a complex geometric structure, one key element of the analysis is transforming the warping functions to urn:x-wiley:sta4:media:sta4220:sta4220-math-0001. We present different transformation approaches and show how they fit into a general class of transformations. This allows us to compare their strengths and limitations. In the context of PCA, our results offer arguments in favour of the centred log-ratio transformation. We further embed two existing approaches from the literature for joint PCA of amplitude and phase variation into the framework of multivariate functional PCA, where we study the properties of the estimators based on an appropriate metric. The approach is illustrated through an application from seismology.

MCML Authors

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Functional Data Analysis

[5]

M. Binder, S. Dandl and J. Moosbauer.
mosmafs: Multi-Objective Simultaneous Model and Feature Selection. R package.
2019. GitHub

Abstract

mosmafs offers a variety of tools that make it possible to use the ecr package for multi-objective optimization of mixed parameter spaces. Mixed here means spaces that both include categorical and numeric hyperparameters. The following (a little contrived) example shows how to use these tools.

MCML Authors

Martin Binder

Statistical Learning and Data Science

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[4]

J. Goldsmith, F. Scheipl, L. Huang, J. Wrobel, C. Di, J. Gellar, J. Harezlak, M. W. McLean, B. Swihart, L. Xiao, C. Crainiceanu and P. T. Reiss.
refund: Regression with Functional Data.
2019. URL

Abstract

Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression. Some of the functions are applicable to image data.

MCML Authors

Fabian Scheipl

PD Dr.

Functional Data Analysis

[3]

P. Probst, M. N. Wright and A.-L. Boulesteix.
Hyperparameters and Tuning Strategies for Random Forest.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9.3 (Jan. 2019). DOI

Abstract

The random forest (RF) algorithm has several hyperparameters that have to be set by the user, for example, the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain, and the number of trees. In this paper, we first provide a literature review on the parameters’ influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a presenting brief overview of tuning strategies, we demonstrate the application of one of the most established tuning strategies, model-based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

2018

[2]

J. Minkwitz, F. Scheipl, E. Binder, C. Sander, U. Hegerl and H. Himmerich.
Generalised functional additive models for brain arousal state dynamics.
IPEG 2018 - 20th International Pharmaco-EEG Society for Preclinical and Clinical Electrophysiological Brain Research Meeting. Zurich, Switzerland, Nov 21-25, 2018. DOI

MCML Authors

Fabian Scheipl

PD Dr.