Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging
Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging
Vision-language foundation models (VLMs) have shown impressive performance in guiding image generation through text, with emerging applicati… (voir plus)ons in medical imaging. In this work, we are the first to investigate the question: 'Can fine-tuned foundation models help identify critical, and possibly unknown, data properties?' By evaluating our proposed method on a chest x-ray dataset, we show that these models can generate high-resolution, precisely edited images compared to methods that rely on Structural Causal Models (SCMs) according to numerous metrics. For the first time, we demonstrate that fine-tuned VLMs can reveal hidden data relationships that were previously obscured due to available metadata granularity and model capacity limitations. Our experiments demonstrate both the potential of these models to reveal underlying dataset properties while also exposing the limitations of fine-tuned VLMs for accurate image editing and susceptibility to biases and spurious correlations.
Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging
Steering CLIP's vision transformer with sparse autoencoders
Ethan Goldfarb
Lorenz Hufe
Yossi Gandelsman
Robert Graham
Wojciech Samek
While vision models are highly capable, their internal mechanisms remain poorly understood-- a challenge which sparse autoencoders (SAEs) ha… (voir plus)ve helped address in language, but which remains underexplored in vision. We address this gap by training SAEs on CLIP's vision transformer and uncover key differences between vision and language processing, including distinct sparsity patterns for SAEs trained across layers and token types. We then provide the first systematic analysis of the steerability of CLIP's vision transformer by introducing metrics to quantify how precisely SAE features can be steered to affect the model's output. We find that 10-15% of neurons and features are steerable, with SAEs providing thousands more steerable features than the base model. Through targeted suppression of SAE features, we then demonstrate improved performance on three vision disentanglement tasks (CelebA, Waterbirds, and typographic attacks), finding optimal disentanglement in middle model layers, and achieving state-of-the-art performance on defense against typographic attacks. We release our CLIP SAE models and code to support future research in vision transformer interpretability.
Bridging biodiversity and ecosystem services through useful plant species
Nina Obiar
Isaac Eckert
Janelle Baker
Daniel Moerman
Genetic Analysis of Polyunsaturated Fatty Acids Biosynthesis Pathway Determines Four Distinct Thraustochytrid Types
Sou‐Yu Cheng
Yi‐Jing Chen
Hsin‐Yang Chang
Ming‐Der Huang
ABSTRACT Thraustochytrids, diverse marine unicellular protists encompassing over 10 recognised genera, are renowned for synthesising polyuns… (voir plus)aturated fatty acids (PUFAs), with content and composition varying substantially across genera. While PUFAs are known to be produced via PUFA synthase (PUFA‐S) and/or elongase/desaturase (ELO/DES) pathways, the distinctions in genes involved remain unexplored. This study analysed PUFA biosynthetic genes in 19 thraustochytrid strains across six genera, categorising them into four types. Type I exclusively utilises the ELO/DES pathway, Type II employs both PUFA‐S and complete ELO/DES pathways, while Types III and IV primarily rely on PUFA‐S, with Type III lacking the canonical Δ9 desaturase and Type IV missing most desaturase and elongase enzymes. Notably, the Δ9 desaturase and ATP‐citrate lyase (ACLY) are exclusive to Types I and II, while β‐carotene hydroxylase (CrtZ) is absent in these types. ACLY absence suggests alternative acetyl‐CoA supply pathways in Types III and IV, whereas CrtZ absence implies either a lack of specific xanthophylls or alternative biosynthetic pathways in Types I and II. Synteny analysis revealed conserved genomic organisation of PUFA biosynthetic genes, indicating a shared evolutionary trajectory. This study provides insights into the genetic diversity underlying PUFA biosynthesis in thraustochytrids, while proposing putative evolutionary pathways for the four lineages.
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
Conditional Diffusion Models are Medical Image Classifiers that Provide Explainability and Uncertainty for Free
debug-gym: A Text-Based Environment for Interactive Debugging
Xingdi Yuan
Morgane M Moss
Charbel Feghali
Chinmay Singh
Darya Moldavskaya
Drew MacPhee
Lucas Caccia
Matheus Pereira
Minseon Kim
Marc-Alexandre Côté
debug-gym: A Text-Based Environment for Interactive Debugging
Xingdi Yuan
Morgane M Moss
Charbel Feghali
Chinmay Singh
Darya Moldavskaya
Drew MacPhee
Lucas Caccia
Matheus Pereira
Minseon Kim
Marc-Alexandre Côté
How do language models learn facts? Dynamics, curricula and hallucinations
Nicolas Zucchet
Jorg Bornschein
Stephanie Chan
Andrew Lampinen
Soham De