Luca Scimeca

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e

Alex Hernández-García

Sujets de recherche

Apprentissage de représentations

Apprentissage profond

Biologie computationnelle

Causalité

Inférence bayésienne

Modèles génératifs

Modèles probabilistes

Site web

Google Scholar

GitHub

Billets de blogue

A tab representation of the Pre trained CLIP Vision Transformer

8 octobre 2025

Pourquoi les modèles d'IA hallucinent et comment y remédier

par

Praneet Suresh

Jack Stanley

Danilo Bzdok

Sonia Joseph

Luca Scimeca

Lire l'article

Publications

From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers

As generative AI systems become competent and democratized in science, business, and government, deeper insight into their failure modes now… (voir plus) poses an acute need. The occasional volatility in their behavior, such as the propensity of transformer models to hallucinate, impedes trust and adoption of emerging AI solutions in high-stakes areas. In the present work, we establish how and when hallucinations arise in pre-trained transformer models through concept representations captured by sparse autoencoders, under scenarios with experimentally controlled uncertainty in the input space. Our systematic experiments reveal that the number of semantic concepts used by the transformer model grows as the input information becomes increasingly unstructured. In the face of growing uncertainty in the input space, the transformer model becomes prone to activate coherent yet input-insensitive semantic features, leading to hallucinated output. At its extreme, for pure-noise inputs, we identify a wide variety of robustly triggered and meaningful concepts in the intermediate activations of pre-trained transformer models, whose functional integrity we confirm through targeted steering. We also show that hallucinations in the output of a transformer model can be reliably predicted from the concept patterns embedded in transformer layer activations. This collection of insights on transformer internal processing mechanics has immediate consequences for aligning AI models with human values, AI safety, opening the attack surface for potential adversarial attacks, and providing a basis for automatic quantification of a model's hallucination risk.

2025-12-02

Conference on Neural Information Processing Systems (Accept (poster))

doi.org

openreview.net

Learning What Matters: Steering Diffusion via Spectrally Anisotropic Forward Noise

Luca Scimeca

Thomas Jiralerspong

Berton Earnshaw

Jason Hartford

Yoshua Bengio

2025-09-30

arXiv (publié)

doi.org

arxiv.org

Torsional-GFN: a conditional conformation generator for small molecules

Lena Nehale Ezzine

Alex Hernández-García

2025-06-10

ICML.cc/2025/Workshop/GenBio (poster)

doi.org

openreview.net

Outsourced Diffusion Sampling: Efficient Posterior Inference in Latent Spaces of Generative Models

Any well-behaved generative model over a variable …

2025-04-30

International Conference on Machine Learning (poster)

doi.org

proceedings.mlr.press

Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control

Thomas Jiralerspong

Berton Earnshaw

Jason Hartford

Yoshua Bengio

Luca Scimeca

Diffusion Probabilistic Models (DPMs) are powerful generative models that have achieved unparalleled success in a number of generative tasks… (voir plus). In this work, we aim to build inductive biases into the training and sampling of diffusion models to better accommodate the target distribution of the data to model. For topologically structured data, we devise a frequency-based noising operator to purposefully manipulate, and set, these inductive biases. We first show that appropriate manipulations of the noising forward process can lead DPMs to focus on particular aspects of the distribution to learn. We show that different datasets necessitate different inductive biases, and that appropriate frequency-based noise control induces increased generative performance compared to standard diffusion. Finally, we demonstrate the possibility of ignoring information at particular frequencies while learning. We show this in an image corruption and recovery task, where we train a DPM to recover the original target distribution after severe noise corruption.

2025-03-05

ICLR.cc/2025/Workshop/DeLTa (poster)

doi.org

openreview.net

Solving Bayesian Inverse Problems with Diffusion Priors and Off-Policy RL

Laurence Perreault-Levasseur

Yoshua Bengio

Glen Berseth

Nikolay Malkin

This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (voir plus)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.

2025-03-05

ICLR.cc/2025/Workshop/DeLTa (poster)

doi.org

openreview.net

Amortizing intractable inference in diffusion models for vision, language, and control

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (voir plus)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-09-24

Neural Information Processing Systems (poster)

doi.org

openreview.net

Improved off-policy training of diffusion samplers

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We ben… (voir plus)chmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at https://github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference.

2024-09-24

Neural Information Processing Systems (poster)

doi.org

openreview.net

Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles

Luca Scimeca

Alexander Rubinstein

Damien Teney

Seong Joon Oh

Armand Mihai Nicolicioiu

Yoshua Bengio

Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as shortcut lea… (voir plus)rning, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose

2023-11-22

ArXiv (prépublication)

openreview.net

Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks

Luca Scimeca

Alexander Rubinstein

Armand Mihai Nicolicioiu

Damien Teney

Yoshua Bengio

Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to shortcut learning phenomena, where… (voir plus) a model may rely on erroneous, easy-to-learn, cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs). We discover that DPMs have the inherent capability to represent multiple visual cues independently, even when they are largely correlated in the training data. We leverage this characteristic to encourage model diversity and empirically show the efficacy of the approach with respect to several diversification objectives. We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.

2023-10-02

ArXiv (prépublication)

doi.org

arxiv.org

Mila Techaide 2026

Propulsion d'entrepreneurs scientifiques

Avantage IA : productivité dans la fonction publique

Luca Scimeca

Billets de blogue

Publications

Mila Techaide 2026

Propulsion d'entrepreneurs scientifiques

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Luca Scimeca

Billets de blogue

Publications