Portrait of Blake Richards

Blake Richards

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science and Department of Neurology and Neurosurgery
Google
Research Topics
Computational Neuroscience
Generative Models
Reinforcement Learning
Representation Learning

Biography

Blake Richards is an associate professor at the School of Computer Science and in the Department of Neurology and Neurosurgery at McGill University, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Richards’ research lies at the intersection of neuroscience and AI. His laboratory investigates universal principles of intelligence that apply to both natural and artificial agents.

He has received several awards for his work, including the NSERC Arthur B. McDonald Fellowship in 2022, the Canadian Association for Neuroscience Young Investigator Award in 2019, and a Canada CIFAR AI Chair in 2018. Richards was a Banting Postdoctoral Fellow at SickKids Hospital from 2011 to 2013.

He obtained his PhD in neuroscience from the University of Oxford in 2010, and his BSc in cognitive science and AI from the University of Toronto in 2004.

Current Students

Research Intern - Université de Montréal
Independent visiting researcher - Seoul National University
Postdoctorate - McGill University
Postdoctorate - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Postdoctorate - McGill University
PhD - McGill University
PhD - McGill University
Independent visiting researcher - Seoul National University
PhD - McGill University
Research Intern - McGill University
Collaborating Alumni
PhD - McGill University
Independent visiting researcher - ETH Zurich
Collaborating researcher - Georgia Tech
Postdoctorate - McGill University
Undergraduate - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - Université de Montréal
Principal supervisor :
Undergraduate - McGill University
Master's Research - McGill University
Independent visiting researcher
Postdoctorate - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
Independent visiting researcher - Seoul National University
Independent visiting researcher - York University
PhD - McGill University
PhD - Concordia University
Principal supervisor :

Publications

Catalyzing next-generation Artificial Intelligence through NeuroAI
Anthony Zador
Sean Escola
Bence Ölveczky
Kwabena Boahen
Matthew Botvinick
Dmitri Chklovskii
Anne Churchland
Claudia Clopath
James DiCarlo
Surya
Surya Ganguli
Jeff Hawkins
Konrad Paul Kording
Alexei Koulakov
Yann LeCun
Timothy P. Lillicrap
Adam
Adam Marblestone … (see 9 more)
Bruno Olshausen
Alexandre Pouget
Cristina Savin
Terrence Sejnowski
Eero Simoncelli
Sara Solla
David Sussillo
Andreas S. Tolias
Doris Tsao
Transfer Entropy Bottleneck: Learning Sequence to Sequence Information Transfer
Damjan Kalajdzievski
Ximeng Mao
Pascal Fortier-Poisson
When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream… (see more)) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challenge when modelling such data is that it is easy for a neural network to rely on the greatest joint correlations within the target stream, which may ignore a crucial but small information transfer from the source to the target stream. As well, there are often situations where the target stream may have previously been modelled independently and it would be useful to use that model to inform a new joint model. Here, we develop an information bottleneck approach for conditional learning on two dependent streams of data. Our method, which we call Transfer Entropy Bottleneck (TEB), allows one to learn a model that bottlenecks the directed information transferred from the source variable to the target variable, while quantifying this information transfer within the model. As such, TEB provides a useful new information bottleneck approach for modelling two statistically dependent streams of data in order to make predictions about one of them.
How gradient estimator variance and bias impact learning in neural networks
Arna Ghosh
Yuhan Helena Liu
Konrad Paul Kording
There is growing interest in understanding how real brains may approximate gradients and how gradients can be used to train neuromorphic chi… (see more)ps. However, neither real brains nor neuromorphic chips can perfectly follow the loss gradient, so parameter updates would necessarily use gradient estimators that have some variance and/or bias. Therefore, there is a need to understand better how variance and bias in gradient estimators impact learning dependent on network and task properties. Here, we show that variance and bias can impair learning on the training data, but some degree of variance and bias in a gradient estimator can be beneficial for generalization. We find that the ideal amount of variance and bias in a gradient estimator are dependent on several properties of the network and task: the size and activity sparsity of the network, the norm of the gradient, and the curvature of the loss landscape. As such, whether considering biologically-plausible learning algorithms or algorithms for training neuromorphic chips, researchers can analyze these properties to determine whether their approximation to gradient descent will be effective for learning given their network and task properties.
Formalizing locality for normative synaptic plasticity models
Colin Bredenberg
Ezekiel Williams
Cristina Savin
H OW GRADIENT ESTIMATOR VARIANCE AND BIAS COULD IMPACT LEARNING IN NEURAL CIRCUITS
Arna Ghosh
Yuhan Helena Liu
Konrad K¨ording
There is growing interest in understanding how real brains may approximate gradients and how gradients can be used to train neuromorphic chi… (see more)ps. However, neither real brains nor neuromorphic chips can perfectly follow the loss gradient, so parameter updates would necessarily use gradient estimators that have some variance and/or bias. Therefore, there is a need to understand better how variance and bias in gradient estimators impact learning dependent on network and task properties. Here, we show that variance and bias can impair learning on the training data, but some degree of variance and bias in a gradient estimator can be beneficial for generalization. We find that the ideal amount of variance and bias in a gradient estimator are dependent on several properties of the network and task: the size and activity sparsity of the network, the norm of the gradient, and the curvature of the loss landscape. As such, whether considering biologically-plausible learning algorithms or algorithms for training neuromorphic chips, researchers can analyze these properties to determine whether their approximation to gradient descent will be effective for learning given their network and task properties.
Stimulus information guides the emergence of behavior related signals in primary somatosensory cortex during learning
Mariangela Panniello
Colleen J Gillon
Roberto Maffulli
Marco Celotto
Stefano Panzeri
Michael M Kohl
Adult neurogenesis acts as a neural regularizer
Lina M. Tran
Adam Santoro
Lulu Liu
Sheena A. Josselyn
Paul W. Frankland
A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions
Anthony GX-Chen
Veronica Chelu
Towards Scaling Difference Target Propagation by Learning Backprop Targets
Maxence Ernoult
Fabrice Normandin
Abhinav Moudgil
Sean Spinney
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to… (see more) scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32
On Neural Architecture Inductive Biases for Relational Tasks
Current deep learning approaches have shown good in-distribution generalization performance, but struggle with out-of-distribution generaliz… (see more)ation. This is especially true in the case of tasks involving abstract relations like recognizing rules in sequences, as we find in many intelligence tests. Recent work has explored how forcing relational representations to remain distinct from sensory representations, as it seems to be the case in the brain, can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by 'partitioned' representations of relations and sensory details, and how this inductive bias can help recompose learned relational structure in newly encountered settings. We introduce a simple architecture based on similarity scores which we name Compositional Relational Network (CoRelNet). Using this model, we investigate a series of inductive biases that ensure abstract relations are learned and represented distinctly from sensory data, and explore their effects on out-of-distribution generalization for a series of relational psychophysics tasks. We find that simple architectural choices can outperform existing models in out-of-distribution generalization. Together, these results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing out-of-distribution relational computations.
Evaluating Multimodal Interactive Agents
Josh Abramson
Arun Ahuja
Federico Carnevale
Petko Georgiev
Alex Goldin
Alden Hung
Jessica Landon
Timothy P. Lillicrap
Alistair M. Muldal
Adam Santoro
Tamara von Glehn
Greg Wayne
Nathaniel Wong
Chen Yan
Creating agents that can interact naturally with humans is a common goal in artificial intelligence (AI) research. However, evaluating these… (see more) interactions is challenging: collecting online human-agent interactions is slow and expensive, yet faster proxy metrics often do not correlate well with interactive evaluation. In this paper, we assess the merits of these existing evaluation metrics and present a novel approach to evaluation called the Standardised Test Suite (STS). The STS uses behavioural scenarios mined from real human interaction data. Agents see replayed scenario context, receive an instruction, and are then given control to complete the interaction offline. These agent continuations are recorded and sent to human annotators to mark as success or failure, and agents are ranked according to the proportion of continuations in which they succeed. The resulting STS is fast, controlled, interpretable, and representative of naturalistic interactions. Altogether, the STS consolidates much of what is desirable across many of our standard evaluation metrics, allowing us to accelerate research progress towards producing agents that can interact naturally with humans. A video may be found at https://youtu.be/YR1TngGORGQ.
Current State and Future Directions for Learning in Biological Recurrent Neural Networks: A Perspective Piece
Luke Y. Prince
Roy Henha Eyono
Ellen Boven
Arna Ghosh
Joseph Pemberton
Franz Scherr
Claudia Clopath
Rui Ponte Costa
Wolfgang Maass
Cristina Savin
Katharina Wilmes
We provide a brief review of the common assumptions about biological learning with findings from experimental neuroscience and contrast them… (see more) with the efficiency of gradient-based learning in recurrent neural networks. The key issues discussed in this review include: synaptic plasticity, neural circuits, theory-experiment divide, and objective functions. We conclude with recommendations for both theoretical and experimental neuroscientists when designing new studies that could help bring clarity to these issues.