Portrait of Yoshua Bengio

Yoshua Bengio

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Scientific Director, Leadership Team
Observer, Board of Directors, Mila

Biography

*For media requests, please write to medias@mila.quebec.

For more information please contact Julie Mongeau, executive assistant at julie.mongeau@mila.quebec.

Yoshua Bengio is recognized worldwide as a leading expert in AI. He is most known for his pioneering work in deep learning, which earned him the 2018 A.M. Turing Award, “the Nobel Prize of computing,” with Geoffrey Hinton and Yann LeCun.

Bengio is a full professor at Université de Montréal, and the founder and scientific director of Mila – Quebec Artificial Intelligence Institute. He is also a senior fellow at CIFAR and co-directs its Learning in Machines & Brains program, serves as scientific director of IVADO, and holds a Canada CIFAR AI Chair.

In 2019, Bengio was awarded the prestigious Killam Prize and in 2022, he was the most cited computer scientist in the world by h-index. He is a Fellow of the Royal Society of London, Fellow of the Royal Society of Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada. In 2023, he was appointed to the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

Concerned about the social impact of AI, Bengio helped draft the Montréal Declaration for the Responsible Development of Artificial Intelligence and continues to raise awareness about the importance of mitigating the potentially catastrophic risks associated with future AI systems.

Current Students

Research Intern - Université de Montréal
PhD - Université de Montréal
Research Intern - Université du Québec à Rimouski
Professional Master's - Université de Montréal
Independent visiting researcher
Co-supervisor :
Independent visiting researcher - UQAR
PhD - Université de Montréal
Independent visiting researcher - MIT
PhD - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating researcher - Université Paris-Saclay
Principal supervisor :
PhD - Université de Montréal
PhD - Massachusetts Institute of Technology
PhD - Université de Montréal
PhD - Université de Montréal
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Professional Master's - Université de Montréal
Collaborating researcher
Postdoctorate - Université de Montréal
Co-supervisor :
Independent visiting researcher - Technical University Munich (TUM)
PhD - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
Co-supervisor :
Research Intern - Université de Montréal
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Collaborating Alumni
Research Intern - Université de Montréal
Professional Master's - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Research Intern - McGill University
Research Intern - Imperial College London
PhD - Université de Montréal
Research Intern - Université de Montréal
Collaborating Alumni - Université de Montréal
DESS - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
Professional Master's - Université de Montréal
Independent visiting researcher - Université de Montréal
Independent visiting researcher - Hong Kong University of Science and Technology (HKUST)
Collaborating researcher - Ying Wu Coll of Computing
Professional Master's - Université de Montréal
Undergraduate - Université de Montréal
PhD - Max-Planck-Institute for Intelligent Systems
Professional Master's - Université de Montréal
Independent visiting researcher - Université de Montréal
Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
Professional Master's - Université de Montréal
Independent visiting researcher - Technical University of Munich
PhD - École Polytechnique Montréal Fédérale de Lausanne
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher
Principal supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - Valence
Principal supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
Collaborating researcher - RWTH Aachen University (Rheinisch-Westfälische Technische Hochschule Aachen)
Principal supervisor :
PhD - Université de Montréal
Professional Master's - Université de Montréal
Collaborating Alumni - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :

Publications

HeMIS: Hetero-Modal Image Segmentation
Mohammad Havaei
Nicolas Guizard
Theano: A Python framework for fast computation of mathematical expressions
Rami Al-rfou'
Guillaume Alain
Amjad Almahairi
Christof Angermüller
Nicolas Ballas
Frédéric Bastien
Justin S. Bayer
A. Belikov
A. Belopolsky
Arnaud Bergeron
J. Bergstra
Valentin Bisson
Josh Bleecher Snyder
Nicolas Bouchard
Nicolas Boulanger-Lewandowski
Xavier Bouthillier
Alexandre De Brébisson
Olivier Breuleux … (see 92 more)
pierre luc carrier
Kyunghyun Cho
Jan Chorowski
Paul F. Christiano
Tim Cooijmans
Marc-Alexandre Côté
Myriam Côté
Yann Dauphin
Olivier Delalleau
Julien Demouth
Guillaume Desjardins
Sander Dieleman
Laurent Dinh
M'elanie Ducoffe
Vincent Dumoulin
Dumitru Erhan
Ziye Fan
Orhan Firat
Mathieu Germain
Xavier Glorot
Ian J. Goodfellow
Matthew Graham
Caglar Gulcehre
Philippe Hamel
Iban Harlouchet
Jean-philippe Heng
Balázs Hidasi
Sina Honari
Arjun Jain
S'ebastien Jean
Kai Jia
Mikhail V. Korobov
Vivek Kulkarni
Alex Lamb
Pascal Lamblin
Eric P. Larsen
César Laurent
S. Lee
Simon-mark Lefrancois
Simon Lemieux
Nicholas Léonard
Zhouhan Lin
J. Livezey
Cory R. Lorenz
Jeremiah L. Lowin
Qianli M. Ma
Pierre-Antoine Manzagol
Olivier Mastropietro
R. McGibbon
Roland Memisevic
Bart van Merriënboer
Vincent Michalski
Mehdi Mirza
Alberto Orlandi
Razvan Pascanu
Mohammad Pezeshki
Colin Raffel
Daniel Renshaw
Matthew David Rocklin
Markus Dr. Roth
Peter Sadowski
John Salvatier
Francois Savard
Jan Schlüter
John D. Schulman
Gabriel Schwartz
Iulian V. Serban
Dmitriy Serdyuk
Samira Shabanian
Etienne Simon
Sigurd Spieckermann
S. Subramanyam
Jakub Sygnowski
Jérémie Tanguay
Gijs van Tulder
Joseph P. Turian
Sebastian Urban
Francesco Visin
Harm de Vries
David Warde-Farley
Dustin J. Webb
M. Willson
Kelvin Xu
Lijun Xue
Li Yao
Saizheng Zhang
Ying Zhang
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (see more)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
Theano: A Python framework for fast computation of mathematical expressions
Rami Al-rfou'
Guillaume Alain
Amjad Almahairi
Christof Angermüller
Nicolas Ballas
Frédéric Bastien
Justin S. Bayer
A. Belikov
A. Belopolsky
Arnaud Bergeron
James Bergstra
Valentin Bisson
Josh Bleecher Snyder
Nicolas Bouchard
Nicolas Boulanger-Lewandowski
Xavier Bouthillier
Alexandre De Brébisson
Olivier Breuleux … (see 92 more)
pierre luc carrier
Kyunghyun Cho
Jan Chorowski
Paul F. Christiano
Tim Cooijmans
Marc-Alexandre Côté
Myriam Côté
Yann Dauphin
Olivier Delalleau
Julien Demouth
Guillaume Desjardins
Sander Dieleman
Laurent Dinh
M'elanie Ducoffe
Vincent Dumoulin
Dumitru Erhan
Ziye Fan
Orhan Firat
Mathieu Germain
Xavier Glorot
Ian G Goodfellow
Matthew Graham
Caglar Gulcehre
Philippe Hamel
Iban Harlouchet
Jean-philippe Heng
Balázs Hidasi
Sina Honari
Arjun Jain
Sébastien Jean
Kai Jia
Mikhail V. Korobov
Vivek Kulkarni
Alex Lamb
Pascal Lamblin
Eric Larsen
César Laurent
S. Lee
Simon-mark Lefrancois
Simon Lemieux
Nicholas Léonard
Zhouhan Lin
J. Livezey
Cory R. Lorenz
Jeremiah L. Lowin
Qianli M. Ma
Pierre-Antoine Manzagol
Olivier Mastropietro
R. McGibbon
Roland Memisevic
Bart van Merriënboer
Vincent Michalski
Mehdi Mirza
Alberto Orlandi
Razvan Pascanu
Mohammad Pezeshki
Colin Raffel
Daniel Renshaw
Matthew David Rocklin
Markus Dr. Roth
Peter Sadowski
John Salvatier
Francois Savard
Jan Schlüter
John D. Schulman
Gabriel Schwartz
Iulian V. Serban
Dmitriy Serdyuk
Samira Shabanian
Etienne Simon
Sigurd Spieckermann
S. Subramanyam
Jakub Sygnowski
Jérémie Tanguay
Gijs van Tulder
Joseph P. Turian
Sebastian Urban
Francesco Visin
Harm de Vries
David Warde-Farley
Dustin J. Webb
M. Willson
Kelvin Xu
Lijun Xue
Li Yao
Saizheng Zhang
Ying Zhang
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (see more)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
Statistical Language and Speech Processing
Fethi Bougares
Horia Cucu
Corneliu Burileanu
Myung-Jae Kim
Il-ho Yang
Jordan Rodu
Dean Phillips Foster
Weichen Wu
Stefan Bott
Felix Stahlberg
Luis A. Trindade
Hui Wang
Statistical Language and Speech Processing
Fethi Bougares
Horia Cucu
Corneliu Burileanu
Myung-Jae Kim
Il-ho Yang
Jordan Rodu
Dean Phillips Foster
Weichen Wu
Stefan Bott
Felix Stahlberg
Luis A. Trindade
Hui Wang
Theano: Deep Learning on GPUs with Python
James Bergstra
Frédéric Bastien
Olivier Breuleux
Pascal Lamblin
Razvan Pascanu
Olivier Delalleau
Guillaume Desjardins
David Warde-Farley
Ian G Goodfellow
Arnaud Bergeron
In this paper, we present Theano 1 , a framework in the Python programming language for defining, optimizing and evaluating expressions invo… (see more)lving high-level operations on tensors. Theano offers most of NumPy’s functionality, but adds automatic symbolic differentiation, GPU support, and faster expression evaluation. Theano is a general mathematical tool, but it was developed with the goal of facilitating research in deep learning. The Deep Learning Tutorials 2 introduce recent advances in deep learning, and showcase how Theano
Theano: A CPU and GPU Math Compiler in Python
James Bergstra
Olivier Breuleux
Frédéric Bastien
Pascal Lamblin
Razvan Pascanu
Guillaume Desjardins
Joseph P. Turian
David Warde-Farley
Theano: A CPU and GPU Math Compiler in Python
James Bergstra
Olivier Breuleux
Frédéric Bastien
Pascal Lamblin
Razvan Pascanu
Guillaume Desjardins
Joseph P. Turian
David Warde-Farley
Theano is a compiler for mathematical expressions in Python that combines the convenience of NumPy's syntax with the speed of optimized nati… (see more)ve machine language. The user composes mathematical expressions in a high-level description that mimics NumPy's syntax and semantics, while being statically typed and functional (as opposed to imperative). These expressions allow Theano to provide symbolic differentiation. Before performing computation, Theano optimizes the choice of expressions, translates them into C++ (or CUDA for GPU), compiles them into dynamically loaded Python modules, all automatically. Common machine learn- ing algorithms implemented with Theano are from 1:6 to 7:5 faster than competitive alternatives (including those implemented with C/C++, NumPy/SciPy and MATLAB) when compiled for the CPU and between 6:5 and 44 faster when compiled for the GPU. This paper illustrates how to use Theano, outlines the scope of the compiler, provides benchmarks on both CPU and GPU processors, and explains its overall design.
L AUGHING H YENA D ISTILLERY Extracting Compact Recurrences From Convolutions
∗. StefanoMassaroli
∗. MichaelPoli
∗. DanielY.Fu
Hermann Kumbong
Rom N. Parnichkun
Aman Timalsina
David W. Romero
Quinn McIntyre
Beidi Chen
Atri Rudra
Ce Zhang
Christopher Re
Stefano Ermon
Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers… (see more). In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads – naively requiring a full pass (or caching of activations) over the input sequence for each generated token – similarly to attention-based models. In this paper, we seek to enable O (1) compute and memory cost per token in any pre-trained long convolution architecture to reduce memory footprint and increase throughput during generation. Concretely, our methods consist in extracting low-dimensional linear state-space models from each convolution layer, building upon rational interpolation and model-order reduction techniques. We further introduce architectural improvements to convolution-based layers such as Hyena : by weight-tying the filters across channels into heads , we achieve higher pre-training quality and reduce the number of filters to be distilled. The resulting model achieves 10 × higher throughput than Transformers and 1 . 5 × higher than Hyena at 1 . 3 B parameters, without any loss in quality after distillation.