Home | Research | Groups | Xi Wang

Research Group Xi Wang


Link to website at TUM

Xi Wang

Dr.

JRG Leader Egocentric Vision

Computer Vision & Artificial Intelligence

Xi Wang

leads the MCML Junior Research Group ‘Egocentric Vision’ at TU Munich.

Xi Wang and her team conduct cutting-edge research in egocentric vision, focusing on learning from first-person human videos to understand behavior patterns and extract valuable information for potential applications in robotics. Their ongoing projects include 3D reconstruction using Gaussian splitting and multimodal learning with vision-language models. Funded as a BMBF project, the group maintains close ties with MCML and actively seeks collaborations that bridge egocentric vision with other research domains, extending beyond our own focus.

Team members @MCML

PostDocs

Link to website

Riccardo Marin

Dr.

Computer Vision & Artificial Intelligence

PhD Students

Link to website

Abhishek Saroha

Computer Vision & Artificial Intelligence

Link to website

Dominik Schnaus

Computer Vision & Artificial Intelligence

Publications @MCML

2025


[3]
C. Koke, D. Schnaus, Y. Shen, A. Saroha, M. Eisenberger, B. Rieck, M. M. Bronstein and D. Cremers.
On multi-scale Graph Representation Learning.
LMRL @ICLR 2025 - Workshop on Learning Meaningful Representations of Life at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published. Preprint available. URL
Abstract

While Graph Neural Networks (GNNs) are widely used in modern computational biology, an underexplored drawback of common GNN methods,is that they are not inherently multiscale consistent: Two graphs describing the same object or situation at different resolution scales are assigned vastly different latent representations. This prevents graph networks from generating data representations that are consistent across scales. It also complicates the integration of representations at the molecular scale with those generated at the biological scale. Here we discuss why existing GNNs struggle with multiscale consistency and show how to overcome this problem by modifying the message passing paradigm within GNNs.

MCML Authors
Link to website

Christian Koke

Computer Vision & Artificial Intelligence

Link to website

Dominik Schnaus

Computer Vision & Artificial Intelligence

Link to website

Yuesong Shen

Computer Vision & Artificial Intelligence

Link to website

Abhishek Saroha

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[2]
L. Sang, Z. Canfes, D. Cao, R. Marin, F. Bernard and D. Cremers.
TwoSquared: 4D Generation from 2D Image Pairs.
Preprint (Apr. 2025). arXiv
Abstract

Despite the astonishing progress in generative AI, 4D dynamic object generation remains an open challenge. With limited high-quality training data and heavy computing requirements, the combination of hallucinating unseen geometry together with unseen movement poses great challenges to generative models. In this work, we propose TwoSquared as a method to obtain a 4D physically plausible sequence starting from only two 2D RGB images corresponding to the beginning and end of the action. Instead of directly solving the 4D generation problem, TwoSquared decomposes the problem into two steps: 1) an image-to-3D module generation based on the existing generative model trained on high-quality 3D assets, and 2) a physically inspired deformation module to predict intermediate movements. To this end, our method does not require templates or object-class-specific prior knowledge and can take in-the-wild images as input. In our experiments, we demonstrate that TwoSquared is capable of producing texture-consistent and geometry-consistent 4D sequences only given 2D images.

MCML Authors
Link to website

Lu Sang

Computer Vision & Artificial Intelligence

Link to website

Riccardo Marin

Dr.

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1]
N. P. A. Vu, A. Saroha, O. Litany and D. Cremers.
GAS-NeRF: Geometry-Aware Stylization of Dynamic Radiance Fields.
Preprint (Mar. 2025). arXiv
Abstract

Current 3D stylization techniques primarily focus on static scenes, while our world is inherently dynamic, filled with moving objects and changing environments. Existing style transfer methods primarily target appearance – such as color and texture transformation – but often neglect the geometric characteristics of the style image, which are crucial for achieving a complete and coherent stylization effect. To overcome these shortcomings, we propose GAS-NeRF, a novel approach for joint appearance and geometry stylization in dynamic Radiance Fields. Our method leverages depth maps to extract and transfer geometric details into the radiance field, followed by appearance transfer. Experimental results on synthetic and real-world datasets demonstrate that our approach significantly enhances the stylization quality while maintaining temporal coherence in dynamic scenes.

MCML Authors
Link to website

Abhishek Saroha

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence