Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Founder and Scientific Advisor, Leadership Team
Research Topics
Causality
Computational Neuroscience
Deep Learning
Generative Models
Graph Neural Networks
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Natural Language Processing
Probabilistic Models
Reasoning
Recurrent Neural Networks
Reinforcement Learning
Representation Learning

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Cassidy MacNeil, Senior Assistant and Operation Lead at cassidy.macneil@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific advisor of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as special advisor and founding scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Publications

S2RMs: Spatially Structured Recurrent Modules
Nasim Rahaman
Muhammad Waleed Gondal
Manuel Wuthrich
Y. Sharma
Bernhard Schölkopf
Compositional Generalization by Factorizing Alignment and Translation
Jacob Russin
R. O’Reilly
Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach
Wenyu Du
Timothy J. O’Donnell
Yue Zhang
It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally eff… (see more)iciently incorporating syntactic structure into neural language models has been a challenging topic. In this paper, we make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances", where information between these two separate objectives shares the same intermediate representation. Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
Factorized embeddings learns rich and biologically meaningful embedding spaces using factorized tensor decomposition
The recent development of sequencing technologies revolutionized our understanding of the inner workings of the cell as well as the way dise… (see more)ase is treated. A single RNA sequencing (RNA-Seq) experiment, however, measures tens of thousands of parameters simultaneously. While the results are information rich, data analysis provides a challenge. Dimensionality reduction methods help with this task by extracting patterns from the data by compressing it into compact vector representations. We present the factorized embeddings (FE) model, a self-supervised deep learning algorithm that learns simultaneously, by tensor factorization, gene and sample representation spaces. We ran the model on RNA-Seq data from two large-scale cohorts and observed that the sample representation captures information on single gene and global gene expression patterns. Moreover, we found that the gene representation space was organized such that tissue-specific genes, highly correlated genes as well as genes participating in the same GO terms were grouped. Finally, we compared the vector representation of samples learned by the FE model to other similar models on 49 regression tasks. We report that the representations trained with FE rank first or second in all of the tasks, surpassing, sometimes by a considerable margin, other representations. A toy example in the form of a Jupyter Notebook as well as the code and trained embeddings for this project can be found at: https://github.com/TrofimovAssya/FactorizedEmbeddings. Supplementary data are available at Bioinformatics online.
Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems
Philippe Beaudoin
Sergey Levine
Charles Blundell
Michael Curtis Mozer
Inherent privacy limitations of decentralized contact tracing apps
Daphne Ippolito
Richard Janda
Max Jarvie
Jean-François Rousseau
Abhinav Sharma
Yun William Yu
Image-to-image Mapping with Many Domains by Sparse Attribute Transfer
Rethinking Distributional Matching Based Domain Adaptation
Bin Li
Yezhen Wang
Tong Che
Shanghang Zhang
Sicheng Zhao
Pengfei Xu
Wenzhen Zhou
Kurt W. Keutzer
Domain adaptation (DA) is a technique that transfers predictive models trained on a labeled source domain to an unlabeled target domain, wit… (see more)h the core difficulty of resolving distributional shift between domains. Currently, most popular DA algorithms are based on distributional matching (DM). However in practice, realistic domain shifts (RDS) may violate their basic assumptions and as a result these methods will fail. In this paper, in order to devise robust DA algorithms, we first systematically analyze the limitations of DM based methods, and then build new benchmarks with more realistic domain shifts to evaluate the well-accepted DM methods. We further propose InstaPBM, a novel Instance-based Predictive Behavior Matching method for robust DA. Extensive experiments on both conventional and RDS benchmarks demonstrate both the limitations of DM methods and the efficacy of InstaPBM: Compared with the best baselines, InstaPBM improves the classification accuracy respectively by
HNHN: Hypergraph Networks with Hyperedge Neurons
Yihe Dong
W. Sawin
Quantized Guided Pruning for Efficient Hardware Implementations of Deep Neural Networks
Matthieu Arzel
Nicolas Farrugia
Deep Neural Networks (DNNs) in general and Convolutional Neural Networks (CNNs) in particular are state-of-the-art in numerous computer visi… (see more)on tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and strongly limits their usability in budget-constrained devices such as embedded devices. In this paper, we propose a combination of a pruning technique and a quantization scheme that effectively reduce the complexity and memory usage of convolutional layers of CNNs, by replacing the complex convolutional operation by a low-cost multiplexer. We perform experiments on CIFAR10, CIFAR100 and SVHN datasets and show that the proposed method achieves almost state-of-the-art accuracy, while drastically reducing the computational and memory footprints compared to the baselines. We also propose an efficient hardware architecture, implemented on Field Programmable Gate Arrays (FPGAs), to accelerate inference, which works as a pipeline and accommodates multiple layers working at the same time to speed up the inference process. In contrast with most proposed approaches which have used external memory or software defined memory controllers, our work is based on algorithmic optimization and full-hardware design, enabling a direct, on-chip memory implementation of a DNN while keeping close to state of the art accuracy.
Untangling tradeoffs between recurrence and self-attention in neural networks
Learning Causal Models Online