Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member

Canada CIFAR AI Chair

Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department

Founder and Scientific Advisor, Leadership Team

Research Topics

Causality

Computational Neuroscience

Deep Learning

Generative Models

Graph Neural Networks

Machine Learning Theory

Medical Machine Learning

Molecular Modeling

Natural Language Processing

Probabilistic Models

Reasoning

Recurrent Neural Networks

Reinforcement Learning

Representation Learning

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

by

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Read the article

Scaling in the service of reasoning & model-based ML

April 4, 2023

Scaling in the Service of Reasoning & Model-Based ML

by

Read the article

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

March 23, 2022

A collaboration between Mila and Relation Therapeutics to discover novel synergistic combinations of drugs in vitro

by

Jake P. Taylor-King

Read the article

Generative Flow Networks

March 15, 2022

Generative Flow Networks

by

Read the article

Publications

Generalization in Machine Learning via Analytical Learning Theory

Kenji Kawaguchi

This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this the… (see more)ory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.

2018-02-20

arXiv.org (preprint)

dblp.uni-trier.de

Towards Understanding Generalization via Analytical Learning Theory

Kenji Kawaguchi

This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this the… (see more)ory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.

2018-02-20

ArXiv (preprint)

Boundary Seeking GANs

R Devon Hjelm

Athul Jacob

Adam Trischler

Gerry Che

2018-02-14

International Conference on Learning Representations (published)

Combining Model-based and Model-free RL via Multi-step Control Variates

Tong Che

Yuchen Lu

George Tucker

Surya Bhupatiraju

Shane Gu

Sergey Levine

2018-02-14

(published)

Learning Generative Models with Locally Disentangled Latent Factors

Brady Neal

Aaron Courville

Ioannis Mitliagkas

2018-02-14

(published)

Finding Flatter Minima with SGD

Stanisław Jastrzębski

Amos Storkey

2018-02-11

International Conference on Learning Representations (published)

dblp.uni-trier.de

Graph Priors for Deep Neural Networks

Joseph Paul Cohen

Georgy Derevyanko

In this work we explore how gene-gene interaction graphs can be used as a prior for the representation of a model to construct features base… (see more)d on known interactions between genes. Most existing machine learning work on graphs focuses on building models when data is confined to a graph structure. In this work we focus on using the information from a graph to build better representations in our models. We use the percolate task, determining if a path exists across a grid for a set of node values, as a proxy for gene pathways. We create variants of the percolate task to explore where existing methods fail. We test the limits of existing methods in order to determine what can be improved when applying these methods to a real task. This leads us to propose new methods based on Graph Convolutional Networks (GCN) that use pooling and dropout to deal with noise in the graph prior.

2018-02-11

(published)

SGD S MOOTHS THE S HARPEST D IRECTIONS

Stanisław Jastrzębski

Amos Storkey

Stochastic gradient descent (SGD) is able to find regions that generalize well, even in drastically over-parametrized models such as deep ne… (see more)ural networks. We observe that noise in SGD controls the spectral norm and conditioning of the Hessian throughout the training. We hypothesize the cause of this phenomenon is due to the dynamics of neurons saturating their non-linearity along the largest curvature directions, thus leading to improved conditioning.

2018-02-11

(published)

Extending the Framework of Equilibrium Propagation to General Dynamics

Benjamin Scellier

2018-02-10

International Conference on Learning Representations (published)

A3T: Adversarially Augmented Adversarial Training

Aristide Baratin

Simon Lacoste-Julien

Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations o… (see more)f the input data purposely designed to fool a machine learning classifier. Most classification models, including deep learning models, are highly vulnerable to adversarial attacks. In this work, we investigate a procedure to improve adversarial robustness of deep neural networks through enforcing representation invariance. The idea is to train the classifier jointly with a discriminator attached to one of its hidden layer and trained to filter the adversarial noise. We perform preliminary experiments to test the viability of the approach and to compare it to other standard adversarial training methods.

2018-01-11

ArXiv (preprint)

Bayesian Model-Agnostic Meta-Learning

Jaesik Yoon

Ousmane Dia

Sungwoong Kim

Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty … (see more)inherent in the problem. In this paper, we propose a novel Bayesian model-agnostic meta-learning method. The proposed method combines scalable gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. During fast adaptation, the method is capable of learning complex uncertainty structure beyond a point estimate or a simple Gaussian approximation. In addition, a robust Bayesian meta-update mechanism with a new meta-loss prevents overfitting during meta-update. Remaining an efficient gradient-based meta-learner, the method is also model-agnostic and simple to implement. Experiment results show the accuracy and robustness of the proposed method in various tasks: sinusoidal regression, image classification, active learning, and reinforcement learning.

2017-12-31

Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (published)

BigBrain: 1D convolutional neural networks for automated sementation of cortical layers

Konrad Wagstyl

Claude Lepage

Karl Zilles

Sebastian Bludau

G. Cucurul

Alan C. Evans

Paul C Fletcher

Adriana Romero

Joseph Paul Cohen

Stéphanie Larocque

Thomas Funck

Katrin Amunts

2017-12-31

(published)

www.semanticscholar.org