Yoshua Bengio

Biographie

*Pour toute demande média, veuillez écrire à medias@mila.quebec.

Pour plus d’information, contactez Cassidy MacNeil, adjointe principale et responsable des opérations cassidy.macneil@mila.quebec.

Reconnu comme une sommité mondiale en intelligence artificielle, Yoshua Bengio s’est surtout distingué par son rôle de pionnier en apprentissage profond, ce qui lui a valu le prix A. M. Turing 2018, le « prix Nobel de l’informatique », avec Geoffrey Hinton et Yann LeCun. Il est professeur titulaire à l’Université de Montréal, fondateur et conseiller scientifique de Mila – Institut québécois d’intelligence artificielle, et codirige en tant que senior fellow le programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Il occupe également la fonction de conseiller spécial et directeur scientifique fondateur d’IVADO.

En 2018, il a été l’informaticien qui a recueilli le plus grand nombre de nouvelles citations au monde. En 2019, il s’est vu décerner le prestigieux prix Killam. Depuis 2022, il détient le plus grand facteur d’impact (h-index) en informatique à l’échelle mondiale. Il est fellow de la Royal Society de Londres et de la Société royale du Canada, et officier de l’Ordre du Canada.

Soucieux des répercussions sociales de l’IA et de l’objectif que l’IA bénéficie à tous, il a contribué activement à la Déclaration de Montréal pour un développement responsable de l’intelligence artificielle.

Étudiants actuels

Jamal Abou Haibeh

Collaborateur·rice alumni - McGill

Berkes Anaïs

Collaborateur·rice de recherche - Cambridge University

Superviseur⋅e principal⋅e :

Rim Assouel

Doctorat - UdeM

Shahana Chatterjee

Collaborateur·rice de recherche - N/A

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche - KAIST

Doctorat - UdeM

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

Doctorat

Doctorat - UdeM

Moksh Jain

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - UdeM

Minsu Kim

Collaborateur·rice de recherche - UdeM

Hyeonah Kim

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Alex Hernández-García

Tabitha Edith Lee

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni

Song LIU

Collaborateur·rice de recherche - s.o.

Cristian Dragos Manta

Doctorat - UdeM

Co-superviseur⋅e :

Dhanya Sridhar

Sarthak Mittal

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Visiteur de recherche indépendant - UdeM

Padideh Nouri

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Ali Parviz

Collaborateur·rice de recherche - Ying Wu Coll of Computing

Camille Rochefort-Boulanger

Lena Podina

Collaborateur·rice de recherche - University of Waterloo

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Postdoctorat - UdeM

Postdoctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Julie Hussin

Divya Sharma

Postdoctorat

Co-superviseur⋅e :

Alex Hernández-García

Mélisande Astrid Crystal Teng

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Hugo Larochelle

Ivan Titov

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

Siva Reddy

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Alex Tong

Collaborateur·rice alumni - UdeM

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Collaborateur·rice de recherche

Collaborateur·rice de recherche - UdeM

Doctorat - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Harry Zhao

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Mise à l’échelle au service du raisonnement et de l’apprentissage automatique basé sur un modèle

Scaling in the service of reasoning & model-based ML

4 avril 2023

par

Yoshua Bengio

Edward J. Hu

Une collaboration entre Mila et Relation Therapeutics pour découvrir in vitro de nouvelles associations médicamenteuses synergiques

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

23 mars 2022

par

Paul Bertin

Jake P. Taylor-King

Yoshua Bengio

Les réseaux de flot génératifs

15 mars 2022

par

Yoshua Bengio

Publications

Leveraging a Fully Differentiable Integrated Assessment Model for RL and Inference

Koen Ponse

Kai-Hendrik Cohrs

Phillip Wozny

Andrew Robert Williams

Tianyu Zhang

Erman Acar

Aske Plaat

Thomas M. Moerland

Pierre Gentine

Gustau Camps-Valls

2025-11-20

EurIPS.cc/2025/Workshop/DiffSys (publié)

openreview.net

A HOT Dataset: 150,000 Buildings for HVAC Operations Transfer Research

Anaïs Berkes

David Rolnick

Donna Vakalis

About 12% of global energy consumption is attributable to heating, ventilation, and air conditioning (HVAC) systems in buildings [11]. Machi… (voir plus)ne learning-based intelligent HVAC control offers significant energy efficiency potential, but progress is constrained by limited data for training and evaluating performance across different kinds of buildings. Existing datasets primarily target energy prediction rather than control applications, forcing studies to rely on limited building sets or single-variable perturbations that fail to capture real-world complexity. We present HOT (HVAC Operations Transfer), the first large-scale open-source dataset purpose-built for research into transfer learning in building control. HOT contains 159,744 unique building-weather combinations with systematic variations across envelope properties, occupancy patterns, and climate conditions spanning all 19 ASHRAE climate zones across 76 global locations. We formalise a comprehensive similarity-based framework with quantitative metrics for assessing transfer feasibility between source and target buildings across multiple context dimensions. Our key contributions: (1) a large-scale, open dataset and tooling enabling systematic, multi-variable transfer studies across 19 climate zones; (2) a quantitative similarity framework spanning geometry, thermal, climate, and function; and (3) zero-shot climate transfer experiments showing why realistic context variation matters for HVAC control.

2025-11-10

ACM Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (publié)

A HOT Dataset: 150,000 Buildings for HVAC Operations Transfer Research

Anaïs Berkes

David Rolnick

Donna Vakalis

2025-11-10

Proceedings of the 12th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (publié)

Scaling Latent Reasoning via Looped Language Models

Ruiming Zhu

Zixuan Wang

Kai Hua

Tianyu Zhang

Ziniu Li

Haoran Que

Boyi Wei

Zixin Wen

Fan Yin

He Xing

Li Li

Jiajun Shi

Kaijing Ma

Shanda Li

Taylor Kergan

Andrew C. Smith

Xin Qu

Mude Hui

Bohong Wu

Qiyang Min … (voir 13 de plus)

Hongzhi Huang

Xun Zhou

Wei Ye

Jiaheng Liu

Jian Yang 0030

Yunfeng Shi

Chenghua Lin

Enduo Zhao

Tianle Cai

Ge Zhang

Wenhao Huang

Jason K. Eshraghian

Modern LLMs are trained to"think"primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-trai… (voir plus)ning and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks. Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities. We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT. We hope our results show the potential of LoopLM as a novel scaling direction in the reasoning era. Our model is available here: http://ouro-llm.github.io.

2025-10-28

ArXiv (prépublication)

Deep-learning-based virtual screening of antibacterial compounds

Gabriele Scalia

Steven T. Rutherford

Ziqing Lu

Kerry R. Buchholz

Nicholas Skelton

Kangway Chuang

Nathaniel Diamant

Jan-Christian Hütter

Jerome-Maxim Luescher

Anh Miu

Jeff Blaney

Leo Gendelev

Elizabeth Skippington

Greg Zynda

Nia Dickson

Michał Koziarski

Aviv Regev

Man-Wah Tan

Tommaso Biancalani

2025-10-23

Nature Biotechnology (inconnu)

Surrogate-based quantification of policy uncertainty in generative flow networks

Ram'on Nartallo-Kaluarachchi

Robert Manson-Sawko

Shashanka Ubaru

Dongsung Huh

Malgorzata J. Zimo'n

Lior Horesh

2025-10-23

ArXiv (prépublication)

Learning What Matters: Steering Diffusion via Spectrally Anisotropic Forward Noise

Luca Scimeca

Thomas Jiralerspong

Berton Earnshaw

Jason Hartford

2025-10-06

ArXiv (prépublication)

Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon

Hyeonseo Cho

Doojin Baek

Sungjin Ahn

Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance nat… (voir plus)urally improves with inference-time computation scaling-standard diffusion-based planners offer only limited avenues for the scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as inference-time computation increases.

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (publié)

proceedings.mlr.press

Towards a Formal Theory of Representational Compositionality

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (publié)

proceedings.mlr.press

HVAC-SPICE: Value-Uncertainty In-Context RL with Thompson Sampling for Zero-Shot HVAC Control

Anaïs Berkes

Urban buildings consume 40\% of global energy, yet most rely on inefficient rule-based HVAC systems due to the impracticality of deploying a… (voir plus)dvanced controllers across diverse building stock. In-context reinforcement learning (ICRL) offers promise for rapid deployment without per-building training, but standard supervised learning objectives that maximise likelihood of training actions inherit behaviour-policy bias and provide weak exploration under the distribution shifts common when transferring across buildings and climates. We present SPICE (Sampling Policies In-Context with Ensemble uncertainty), a novel ICRL method specifically designed for zero-shot building control that addresses these fundamental limitations. SPICE introduces two key methodological innovations: (i) a propensity-corrected, return-aware training objective that prioritises high-advantage, high-uncertainty actions to enable improvement beyond suboptimal training demonstrations, and (ii) lightweight value ensembles with randomised priors that provide explicit uncertainty estimates for principled episode-level Thompson sampling. At deployment, SPICE samples one value head per episode and acts greedily, resulting in temporally coherent exploration without test-time gradients or building-specific models. We establish a comprehensive experimental protocol using the HOT dataset to evaluate SPICE across diverse building types and climate zones, focusing on the energy efficiency, occupant comfort, and zero-shot transfer capabilities that are critical for urban-scale deployment.

2025-09-29

NeurIPS.cc/2025/Workshop/UrbanAI (poster)

openreview.net

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Johan Samir Obando Ceron

Brian R. Bartoldson

Bhavya Kailkhura

Guillaume Lajoie

Glen Berseth

Nikolay Malkin

Moksh J. Jain

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference… (voir plus) to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.

2025-09-29

ArXiv (prépublication)

Active Attacks: Red-teaming LLMs via Adaptive Environments

Taeyoung YUN

Pierre-Luc St-Charles

Jinkyoo Park