Research Group Xi Wang

Computer Vision & Artificial Intelligence

Xi Wang

leads the MCML Junior Research Group ‘Egocentric Vision’ at TU Munich.

Xi Wang and her team conduct cutting-edge research in egocentric vision, focusing on learning from first-person human videos to understand behavior patterns and extract valuable information for potential applications in robotics. Their ongoing projects include 3D reconstruction using Gaussian splitting and multimodal learning with vision-language models. Funded as a BMBF project, the group maintains close ties with MCML and actively seeks collaborations that bridge egocentric vision with other research domains, extending beyond our own focus.

Team members @MCML

PostDocs

Riccardo Marin

Dr.

B1 | Computer Vision
→ Group Xi Wang

Computer Vision & Artificial Intelligence

PhD Students

Abhishek Saroha

B1 | Computer Vision
→ Group Xi Wang

Computer Vision & Artificial Intelligence

Dominik Schnaus

B1 | Computer Vision
→ Group Xi Wang

Computer Vision & Artificial Intelligence

Publications @MCML

2025

[4]

C. Koke, D. Schnaus, Y. Shen, A. Saroha, M. Eisenberger, B. Rieck, M. M. Bronstein and D. Cremers.
On multi-scale Graph Representation Learning.
LMRL @ICLR 2025 - Workshop on Learning Meaningful Representations of Life at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

While Graph Neural Networks (GNNs) are widely used in modern computational biology, an underexplored drawback of common GNN methods,is that they are not inherently multiscale consistent: Two graphs describing the same object or situation at different resolution scales are assigned vastly different latent representations. This prevents graph networks from generating data representations that are consistent across scales. It also complicates the integration of representations at the molecular scale with those generated at the biological scale. Here we discuss why existing GNNs struggle with multiscale consistency and show how to overcome this problem by modifying the message passing paradigm within GNNs.

MCML Authors

Christian Koke

Computer Vision & Artificial Intelligence

Dominik Schnaus

Computer Vision & Artificial Intelligence

Yuesong Shen

* Former Member

Abhishek Saroha

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[3]

L. Sang, Z. Canfes, D. Cao, R. Marin, F. Bernard and D. Cremers.
TwoSquared: 4D Generation from 2D Image Pairs.
Preprint (Apr. 2025). arXiv

Abstract

Despite the astonishing progress in generative AI, 4D dynamic object generation remains an open challenge. With limited high-quality training data and heavy computing requirements, the combination of hallucinating unseen geometry together with unseen movement poses great challenges to generative models. In this work, we propose TwoSquared as a method to obtain a 4D physically plausible sequence starting from only two 2D RGB images corresponding to the beginning and end of the action. Instead of directly solving the 4D generation problem, TwoSquared decomposes the problem into two steps: 1) an image-to-3D module generation based on the existing generative model trained on high-quality 3D assets, and 2) a physically inspired deformation module to predict intermediate movements. To this end, our method does not require templates or object-class-specific prior knowledge and can take in-the-wild images as input. In our experiments, we demonstrate that TwoSquared is capable of producing texture-consistent and geometry-consistent 4D sequences only given 2D images.

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Riccardo Marin

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[2]

N. P. A. Vu, A. Saroha, O. Litany and D. Cremers.
GAS-NeRF: Geometry-Aware Stylization of Dynamic Radiance Fields.
Preprint (Mar. 2025). arXiv

Abstract

Current 3D stylization techniques primarily focus on static scenes, while our world is inherently dynamic, filled with moving objects and changing environments. Existing style transfer methods primarily target appearance – such as color and texture transformation – but often neglect the geometric characteristics of the style image, which are crucial for achieving a complete and coherent stylization effect. To overcome these shortcomings, we propose GAS-NeRF, a novel approach for joint appearance and geometry stylization in dynamic Radiance Fields. Our method leverages depth maps to extract and transfer geometric details into the radiance field, followed by appearance transfer. Experimental results on synthetic and real-world datasets demonstrate that our approach significantly enhances the stylization quality while maintaining temporal coherence in dynamic scenes.

MCML Authors

Abhishek Saroha

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

2024

[1]

L. Sang, M. Gao, A. Saroha and D. Cremers.
Enhancing Surface Neural Implicits with Curvature-Guided Sampling and Uncertainty-Augmented Representations.
Wild3D @ECCV 2024 - Workshop 3D Modeling, Reconstruction, and Generation in the Wild at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. URL

Abstract

Neural implicits are a widely used surface presentation because they offer an adaptive resolution and support arbitrary topology changes. While previous works rely on ground truth point clouds or meshes, they often do not discuss the data acquisition and ignore the effect of input quality and sampling methods during reconstruction. In this paper, we introduce a sampling method with an uncertainty-augmented surface implicit representation that employs a sampling technique that considers the geometric characteristics of inputs. To this end, we introduce a strategy that efficiently computes differentiable geometric features, namely, mean curvatures, to guide the sampling phase during the training period. The uncertainty augmentation offers insights into the occupancy and reliability of the output signed distance value, thereby expanding representation capabilities into open surfaces. Finally, we demonstrate that our method improves the reconstruction of both synthetic and real-world data.

MCML Authors

Lu Sang

Computer Vision & Artificial Intelligence

Maolin Gao

Computer Vision & Artificial Intelligence

Abhishek Saroha

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.