Amar Kumar

Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging

Anita Kriz

B. Pertzov

2025-03-30

ArXiv (prépublication)

Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging

Anita Kriz

B. Pertzov

Vision-language foundation models (VLMs) have shown impressive performance in guiding image generation through text, with emerging applicati… (voir plus)ons in medical imaging. In this work, we are the first to investigate the question: 'Can fine-tuned foundation models help identify critical, and possibly unknown, data properties?' By evaluating our proposed method on a chest x-ray dataset, we show that these models can generate high-resolution, precisely edited images compared to methods that rely on Structural Causal Models (SCMs) according to numerous metrics. For the first time, we demonstrate that fine-tuned VLMs can reveal hidden data relationships that were previously obscured due to available metadata granularity and model capacity limitations. Our experiments demonstrate both the potential of these models to reveal underlying dataset properties while also exposing the limitations of fine-tuned VLMs for accurate image editing and susceptibility to biases and spurious correlations.

2025-03-30

ArXiv (prépublication)

PRISM: High-Resolution&Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

Developing reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, da… (voir plus)ta imbalances, and limited text annotations in datasets. Addressing these challenges requires architectures robust to the unique complexities posed by medical imaging data. The rapid advancements in vision-language foundation models within the natural image domain prompt the question of how they can be adapted for medical imaging tasks. In this work, we present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals using Stable Diffusion. Our approach demonstrates unprecedented precision in selectively modifying spurious correlations (the medical devices) and disease features, enabling the removal and addition of specific attributes while preserving other image characteristics. Through extensive evaluation, we show how PRISM advances counterfactual generation and enables the development of more robust downstream classifiers for clinically deployable solutions. To facilitate broader adoption and research, we make our code publicly available at https://github.com/Amarkr1/PRISM.

2025-03-27

MIDL.io/2025/Conference (présentation orale)

openreview.net

RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models

Mohammed Mohammed

2025-03-20

ArXiv (prépublication)

PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

Developing reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, da… (voir plus)ta imbalances, and limited text annotations in datasets. Addressing these challenges requires architectures robust to the unique complexities posed by medical imaging data. The rapid advancements in vision-language foundation models within the natural image domain prompt the question of how they can be adapted for medical imaging tasks. In this work, we present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals using Stable Diffusion. Our approach demonstrates unprecedented precision in selectively modifying spurious correlations (the medical devices) and disease features, enabling the removal and addition of specific attributes while preserving other image characteristics. Through extensive evaluation, we show how PRISM advances counterfactual generation and enables the development of more robust downstream classifiers for clinically deployable solutions. To facilitate broader adoption and research, we make our code publicly available at https://github.com/Amarkr1/PRISM.

2025-03-01

arXiv (publié)

RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models

Parham Saremi

Mohammed Mohammed

Zahra Tehrani Nasab

2025-03-01

arXiv (publié)

PRISM: High-Resolution&Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

Developing reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, da… (voir plus)ta imbalances, and limited text annotations in datasets. Addressing these challenges requires architectures robust to the unique complexities posed by medical imaging data. The rapid advancements in vision-language foundation models within the natural image domain prompt the question of how they can be adapted for medical imaging tasks. In this work, we present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals using Stable Diffusion. Our approach demonstrates unprecedented precision in selectively modifying spurious correlations (the medical devices) and disease features, enabling the removal and addition of specific attributes while preserving other image characteristics. Through extensive evaluation, we show how PRISM advances counterfactual generation and enables the development of more robust downstream classifiers for clinically deployable solutions. To facilitate broader adoption and research, we make our code publicly available at https://github.com/Amarkr1/PRISM.

2025-02-28

ArXiv (prépublication)

AURA: A Multi-modal Medical Agent for Understanding, Reasoning and Annotation

Nima Fathi

2025-01-01

Agentic AI/CREATE/Clinical MLLMs@MICCAI (publié)

Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging

Anita Kriz

Barak Pertzov

2025-01-01

CVPR Workshops (publié)

The role of AI for MRI-analysis in multiple sclerosis—A brief overview

Jean-Pierre R. Falet

Steven Nobile

Aliya Szpindel

Berardino Barile

Joshua D. Durso-Finley

Douglas Arnold

2025-01-01

Frontiers Artif. Intell. (publié)