Publications

Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging

B. Pertzov

Vision-language foundation models (VLMs) have shown impressive performance in guiding image generation through text, with emerging applicati… (see more)ons in medical imaging. In this work, we are the first to investigate the question: 'Can fine-tuned foundation models help identify critical, and possibly unknown, data properties?' By evaluating our proposed method on a chest x-ray dataset, we show that these models can generate high-resolution, precisely edited images compared to methods that rely on Structural Causal Models (SCMs) according to numerous metrics. For the first time, we demonstrate that fine-tuned VLMs can reveal hidden data relationships that were previously obscured due to available metadata granularity and model capacity limitations. Our experiments demonstrate both the potential of these models to reveal underlying dataset properties while also exposing the limitations of fine-tuned VLMs for accurate image editing and susceptibility to biases and spurious correlations.

2025-03-30

ArXiv (preprint)

doi.org

arxiv.org

Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging

Amar Kumar

Anita Kriz

B. Pertzov

Tal Arbel

2025-03-30

ArXiv (preprint)

arxiv.org

Steering CLIP's vision transformer with sparse autoencoders

Sonia Joseph

Praneet Suresh

Ethan Goldfarb

Lorenz Hufe

Yossi Gandelsman

Robert Graham

Danilo Bzdok

Wojciech Samek

Blake Richards

While vision models are highly capable, their internal mechanisms remain poorly understood-- a challenge which sparse autoencoders (SAEs) ha… (see more)ve helped address in language, but which remains underexplored in vision. We address this gap by training SAEs on CLIP's vision transformer and uncover key differences between vision and language processing, including distinct sparsity patterns for SAEs trained across layers and token types. We then provide the first systematic analysis of the steerability of CLIP's vision transformer by introducing metrics to quantify how precisely SAE features can be steered to affect the model's output. We find that 10-15% of neurons and features are steerable, with SAEs providing thousands more steerable features than the base model. Through targeted suppression of SAE features, we then demonstrate improved performance on three vision disentanglement tasks (CelebA, Waterbirds, and typographic attacks), finding optimal disentanglement in middle model layers, and achieving state-of-the-art performance on defense against typographic attacks. We release our CLIP SAE models and code to support future research in vision transformer interpretability.

2025-03-30

thecvf.com/CVPR/2025/Workshop/MIV (poster)

openreview.net

Bridging biodiversity and ecosystem services through useful plant species

Nina Obiar

Isaac Eckert

Janelle Baker

Daniel Moerman

Laura J. Pollock

2025-03-28

PLANTS, PEOPLE, PLANET (published)

doi.org

Genetic Analysis of Polyunsaturated Fatty Acids Biosynthesis Pathway Determines Four Distinct Thraustochytrid Types

Sou‐Yu Cheng

Yi‐Jing Chen

Hsiu-Chin Lin

Hsin‐Yang Chang

Ming‐Der Huang

ABSTRACT Thraustochytrids, diverse marine unicellular protists encompassing over 10 recognised genera, are renowned for synthesising polyuns… (see more)aturated fatty acids (PUFAs), with content and composition varying substantially across genera. While PUFAs are known to be produced via PUFA synthase (PUFA‐S) and/or elongase/desaturase (ELO/DES) pathways, the distinctions in genes involved remain unexplored. This study analysed PUFA biosynthetic genes in 19 thraustochytrid strains across six genera, categorising them into four types. Type I exclusively utilises the ELO/DES pathway, Type II employs both PUFA‐S and complete ELO/DES pathways, while Types III and IV primarily rely on PUFA‐S, with Type III lacking the canonical Δ9 desaturase and Type IV missing most desaturase and elongase enzymes. Notably, the Δ9 desaturase and ATP‐citrate lyase (ACLY) are exclusive to Types I and II, while β‐carotene hydroxylase (CrtZ) is absent in these types. ACLY absence suggests alternative acetyl‐CoA supply pathways in Types III and IV, whereas CrtZ absence implies either a lack of specific xanthophylls or alternative biosynthetic pathways in Types I and II. Synteny analysis revealed conserved genomic organisation of PUFA biosynthetic genes, indicating a shared evolutionary trajectory. This study provides insights into the genetic diversity underlying PUFA biosynthesis in thraustochytrids, while proposing putative evolutionary pathways for the four lineages.

2025-03-28

Environmental Microbiology (published)

doi.org

Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments

Felix Heide

2025-03-28

ArXiv (preprint)

arxiv.org

Conditional Diffusion Models are Medical Image Classifiers that Provide Explainability and Uncertainty for Free

Gian Mario Favero

2025-03-27

MIDL.io/2025/Conference (oral)

doi.org

openreview.net

debug-gym: A Text-Based Environment for Interactive Debugging

Xingdi Yuan

Morgane M Moss

Charbel Feghali

Chinmay Singh

Darya Moldavskaya

Drew MacPhee

Lucas Caccia

Matheus Pereira

Minseon Kim

Alessandro Sordoni

Marc-Alexandre Côté

2025-03-27

ArXiv (preprint)

doi.org

arxiv.org

debug-gym: A Text-Based Environment for Interactive Debugging

Xingdi Yuan

Morgane M Moss

Charbel Feghali

Chinmay Singh

Darya Moldavskaya

Drew MacPhee

Lucas Caccia

Matheus Pereira

Minseon Kim

Alessandro Sordoni

Marc-Alexandre Côté

2025-03-27

ArXiv (preprint)

arxiv.org

How do language models learn facts? Dynamics, curricula and hallucinations

Nicolas Zucchet

Jorg Bornschein

Stephanie Chan

Andrew Lampinen

Razvan Pascanu

Soham De

2025-03-27

ArXiv (preprint)

doi.org

arxiv.org

How do language models learn facts? Dynamics, curricula and hallucinations

Nicolas Zucchet

Jorg Bornschein

Stephanie Chan

Andrew Lampinen

Razvan Pascanu

Soham De

2025-03-27

ArXiv (preprint)

arxiv.org

PRISM: High-Resolution&Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

Amar Kumar

Anita Kriz

Mohammad Havaei

Tal Arbel

Developing reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, da… (see more)ta imbalances, and limited text annotations in datasets. Addressing these challenges requires architectures robust to the unique complexities posed by medical imaging data. The rapid advancements in vision-language foundation models within the natural image domain prompt the question of how they can be adapted for medical imaging tasks. In this work, we present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals using Stable Diffusion. Our approach demonstrates unprecedented precision in selectively modifying spurious correlations (the medical devices) and disease features, enabling the removal and addition of specific attributes while preserving other image characteristics. Through extensive evaluation, we show how PRISM advances counterfactual generation and enables the development of more robust downstream classifiers for clinically deployable solutions. To facilitate broader adoption and research, we make our code publicly available at https://github.com/Amarkr1/PRISM.

2025-03-27

MIDL.io/2025/Conference (oral)

doi.org

openreview.net

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications