Huy Le

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e

Aishwarya Agrawal

Sujets de recherche

Apprentissage de représentations

Apprentissage multimodal

Apprentissage profond

Vision et langage

Site web

Google Scholar

GitHub

Publications

How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning

Cross-view spatial reasoning remains a weak spot for vision-language models (VLMs): they often reason in language and lose the fine-grained … (voir plus)geometry needed for the task. Thinking with images aims to address this by generating an intermediate thinking image, but recent work shows that models often ignore the visual evidence in these traces. We therefore ask how to make visual thinking matter, and what kind of visual thinking works best. We study these questions in unified multimodal models (UMMs), which natively support interleaved image-text generation. For the first question, we propose View Dropout (VDrop), a training-time intervention that hides parts of one input view from the answer span while keeping them visible to the thinking-image tokens. This encourages the model to use the thinking image when answering, instead of relying only on the input views. Once the thinking image is used for answer prediction, we study which type of visual thinking is most effective. We frame this as a learnability-informativeness tradeoff and compare three thinking-image variants: top-down, panoramic, and point-matching renderings. Trained on synthetic scenes and evaluated on five real-world out-of-domain benchmarks, panoramic visual thinking with VDrop is the only configuration that is both informative and learnable, and it achieves the best out-of-domain generalization.

2026-05-25

arXiv (prépublication)

doi.org

arxiv.org

Deep learning of chest X-rays can predict mechanical ventilation outcome in ICU-admitted COVID-19 patients

Daniel Gourdeau

Olivier Potvin

Jason Henry Biem

Florence Cloutier

Lyna Abrougui

Patrick Archambault

Carl Chartrand-Lefebvre

Louis Dieumegarde

Christian Gagné

Louis Gagnon

Raphaelle Giguère

Alexandre Hains

Huy Le

Simon Lemieux

Marie-Hélène Lévesque

Simon Nepveu

Lorne Rosenbloom

An Tang

Issac Yang

Nathalie Duchesne … (voir 1 de plus)

Simon Duchesne

2022-04-12

Scientific Reports (publié)

doi.org

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Huy Le

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Huy Le

Publications