27.11.2025
Large vision-language models (VLMs) like CLIP (Contrastive Language-Image Pre-training) have changed how AI works with mixed inputs of images and text, by learning to connect pictures and words. Given an image with a caption like “a dog playing with a ball”, CLIP learns to link visual patterns (the …
©MCML
24.11.2025
During my research stay at Stanford University from July to September 2025, I had the pleasure of being part of the research group led by Assistant Professor Serena Yeung in the Department of Biomedical Data Science. My two-month stay in California gave me the opportunity to investigate how public scientific articles can be leveraged to build …
20.11.2025
State-of-the-art diffusion models like DiT and Stable Diffusion have made AI image generation incredibly powerful. But they still struggle with one big issue: scaling to large images or videos quickly and efficiently without exhausting your GPU memory. What if we could process images faster, use less memory, and still retain visual quality—without …
2024-11-22 - Last modified: 2025-11-27