Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Scientific Director, Leadership Team
Observer, Board of Directors, Mila

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Julie Mongeau, executive assistant at julie.mongeau@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific director of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Professional Master's - Université de Montréal
Co-supervisor :
Professional Master's - Université de Montréal
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Collaborating researcher - Université Paris-Saclay
Principal supervisor :
Professional Master's - Université de Montréal
Independent visiting researcher - MIT
PhD - École Polytechnique Montréal Fédérale de Lausanne
Research Intern - Université du Québec à Rimouski
Collaborating researcher
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
Professional Master's - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
PhD - Université de Montréal
Research Intern - Université de Montréal
Collaborating Alumni
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Independent visiting researcher - Université de Montréal
Professional Master's - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
PhD - Massachusetts Institute of Technology
PhD - Université de Montréal
PhD - Université de Montréal
Independent visiting researcher - Technical University Munich (TUM)
Independent visiting researcher - Hong Kong University of Science and Technology (HKUST)
DESS - Université de Montréal
Independent visiting researcher - UQAR
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Research Intern - Université de Montréal
Independent visiting researcher - Technical University of Munich
Research Intern - Imperial College London
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
PhD - McGill University
Principal supervisor :
Professional Master's - Université de Montréal
Collaborating researcher - Université de Montréal
Research Intern - Université de Montréal
Research Intern - Université de Montréal
PhD - Max-Planck-Institute for Intelligent Systems
PhD - McGill University
Principal supervisor :
Collaborating Alumni - Université de Montréal
Professional Master's - Université de Montréal
PhD - Université de Montréal
Independent visiting researcher - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating researcher
Professional Master's - Université de Montréal
Collaborating researcher - Valence
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
Independent visiting researcher
Co-supervisor :
Postdoctorate - Université de Montréal
Research Intern - McGill University
Professional Master's - Université de Montréal
Collaborating researcher
Principal supervisor :
Master's Research - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Principal supervisor :
Undergraduate - Université de Montréal
PhD - Université de Montréal
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Professional Master's - Université de Montréal
Postdoctorate - Université de Montréal

Publications

Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
Jade Leung
Daniel Kokotajlo
Nahema A. Marchal
Markus Anderljung
Noam Kolt
Lewis Ho
Divya Siddarth
Shahar Avin
W. Hawkins
Been Kim
Iason Gabriel
Vijay Bolina
Jack Clark
Paul F. Christiano … (see 1 more)
Allan Dafoe
Responses of pyramidal cell somata and apical dendrites in mouse visual cortex over multiple days
Colleen J Gillon
Jérôme A. Lecoq
Jason E. Pina
Ruweida Ahmed
Yazan N. Billeh
Shiella Caldejon
Peter Groblewski
Timothy M. Henley
India Kato
Eric Lee
Jennifer Luviano
Kyla Mace
Chelsea Nayan
Thuyanh V. Nguyen
Kat North
Jed Perkins
Sam Seid
Matthew T. Valley
Ali Williford
Timothy P. Lillicrap
Joel Zylberberg
Automated Detection of Anatomical Landmarks During Colonoscopy Using a Deep Learning Model
Mahsa Taghiakbari
Sina Hamidi Ghalehjegh
Emmanuel Jehanno
Tess Berthier
Lisa Di Jorio
Saber Ghadakzadeh
Alan Barkun
Mark Takla
Mickael Bouin
Eric Deslandres
Simon Bouchard
Sacha Sidani
Daniel von Renteln
Abstract Background and aims Identification and photo-documentation of the ileocecal valve (ICV) and appendiceal orifice (AO) confirm comple… (see more)teness of colonoscopy examinations. We aimed to develop and test a deep convolutional neural network (DCNN) model that can automatically identify ICV and AO, and differentiate these landmarks from normal mucosa and colorectal polyps. Methods We prospectively collected annotated full-length colonoscopy videos of 318 patients undergoing outpatient colonoscopies. We created three nonoverlapping training, validation, and test data sets with 25,444 unaltered frames extracted from the colonoscopy videos showing four landmarks/image classes (AO, ICV, normal mucosa, and polyps). A DCNN classification model was developed, validated, and tested in separate data sets of images containing the four different landmarks. Results After training and validation, the DCNN model could identify both AO and ICV in 18 out of 21 patients (85.7%). The accuracy of the model for differentiating AO from normal mucosa, and ICV from normal mucosa were 86.4% (95% CI 84.1% to 88.5%), and 86.4% (95% CI 84.1% to 88.6%), respectively. Furthermore, the accuracy of the model for differentiating polyps from normal mucosa was 88.6% (95% CI 86.6% to 90.3%). Conclusion This model offers a novel tool to assist endoscopists with automated identification of AO and ICV during colonoscopy. The model can reliably distinguish these anatomical landmarks from normal mucosa and colorectal polyps. It can be implemented into automated colonoscopy report generation, photo-documentation, and quality auditing solutions to improve colonoscopy reporting quality.
Combining Parameter-efficient Modules for Task-level Generalisation
Better Training of GFlowNets with Local Credit and Incomplete Trajectories
Ling Pan
Nikolay Malkin
Dinghuai Zhang
Equivariance With Learned Canonicalization Functions
Sékou-Oumar Kaba
Arnab Kumar Mondal
Yan Zhang
Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations… (see more). In this paper, we propose an alternative that avoids this architectural constraint by learning to produce a canonical representation of the data. These canonicalization functions can readily be plugged into non-equivariant backbone architectures. We offer explicit ways to implement them for many groups of interest. We show that this approach enjoys universality while providing interpretable insights. Our main hypothesis is that learning a neural network to perform canonicalization is better than doing it using predefined heuristics. Our results show that learning the canonicalization function indeed leads to better results and that the approach achieves great performance in practice.
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Nguyen
Daniel Y Fu
Tri Dao
Stephen Baccus
Stefano Ermon
Christopher Re
Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the c… (see more)ore building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting the amount of context accessible. Existing subquadratic methods based on low-rank and sparse approximations need to be combined with dense attention layers to match Transformers at scale, indicating a gap in capability. In this work, we propose Hyena, a subquadratic drop-in replacement for attention constructed by interleaving implicitly parametrized long convolutions and data-controlled gating. In challenging reasoning tasks on sequences of thousands to hundreds of thousands of tokens, Hyena improves accuracy by more than 50 points over operators relying on state-space models, transfer functions, and other implicit and explicit methods, matching attention-based models. We set a new state-of-the-art for dense-attention-free architectures on language modeling in standard datasets WikiText103 and The Pile, reaching Transformer quality with a 20% reduction in training compute required at sequence length 2k. Hyena operators are 2x faster than highly optimized attention at sequence length 8k, with speedups of 100x at 64k.
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Nguyen
Daniel Y Fu
Tri Dao
Stephen Baccus
Stefano Ermon
Christopher Re
Interventional Causal Representation Learning
Kartik Ahuja
Yixin Wang
Divyat Mahajan
Causal representation learning seeks to extract high-level latent factors from low-level sensory data. Most existing methods rely on observa… (see more)tional data and structural assumptions (e.g., conditional independence) to identify the latent factors. However, interventional data is prevalent across applications. Can interventional data facilitate causal representation learning? We explore this question in this paper. The key observation is that interventional data often carries geometric signatures of the latent factors' support (i.e. what values each latent can possibly take). For example, when the latent factors are causally connected, interventions can break the dependency between the intervened latents' support and their ancestors'. Leveraging this fact, we prove that the latent causal factors can be identified up to permutation and scaling given data from perfect
Multi-Objective GFlowNets
Moksh J. Jain
Sharath Chandra Raparthy
Alex Hernandez-Garcia
Jarrid Rector-Brooks
Santiago Miret
Emmanuel Bengio
We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learni… (see more)ng such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. Moreover, these objectives are often imperfect evaluations of some underlying property of interest, making it important to generate diverse candidates to have multiple options for expensive downstream evaluations. We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions, based on GFlowNets. We introduce two variants of MOGFNs: MOGFN-PC, which models a family of independent sub-problems defined by a scalarization function, with reward-conditional GFlowNets, and MOGFN-AL, which solves a sequence of sub-problems defined by an acquisition function in an active learning loop. Our experiments on wide variety of synthetic and benchmark tasks demonstrate advantages of the proposed methods in terms of the Pareto performance and importantly, improved candidate diversity, which is the main contribution of this work.
Catalyzing next-generation Artificial Intelligence through NeuroAI
Anthony Zador
Sean Escola
Bence Ölveczky
Kwabena Boahen
Matthew Botvinick
Dmitri Chklovskii
Anne Churchland
Claudia Clopath
James DiCarlo
Surya
Surya Ganguli
Jeff Hawkins
Konrad Paul Kording
Alexei Koulakov
Yann LeCun
Timothy P. Lillicrap
Adam
Adam Marblestone … (see 9 more)
Bruno Olshausen
Alexandre Pouget
Cristina Savin
Terrence Sejnowski
Eero Simoncelli
Sara Solla
David Sussillo
Andreas S. Tolias
Doris Tsao
Proactive Contact Tracing
Prateek Gupta
Martin Weiss
Nasim Rahaman
Hannah Alsdurf
Nanor Minoyan
Soren Harnois-Leblanc
Joanna Merckx
andrew williams
Victor Schmidt
Pierre-Luc St-Charles
Akshay Patel
Yang Zhang
Bernhard Schölkopf