Portrait of Blake Richards

Blake Richards

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science and Department of Neurology and Neurosurgery
Research Topics
Computational Neuroscience
Generative Models
Reinforcement Learning
Representation Learning

Biography

Blake Richards is an associate professor at the School of Computer Science and in the Department of Neurology and Neurosurgery at McGill University, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Richards’ research lies at the intersection of neuroscience and AI. His laboratory investigates universal principles of intelligence that apply to both natural and artificial agents.

He has received several awards for his work, including the NSERC Arthur B. McDonald Fellowship in 2022, the Canadian Association for Neuroscience Young Investigator Award in 2019, and a Canada CIFAR AI Chair in 2018. Richards was a Banting Postdoctoral Fellow at SickKids Hospital from 2011 to 2013.

He obtained his PhD in neuroscience from the University of Oxford in 2010, and his BSc in cognitive science and AI from the University of Toronto in 2004.

Current Students

Independent visiting researcher - Seoul National University
Research Intern - McGill University
Postdoctorate - McGill University
Postdoctorate - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Postdoctorate - McGill University
Research Intern - McGill University
PhD - McGill University
Independent visiting researcher - Seoul National University
PhD - McGill University
Undergraduate - McGill University
Collaborating Alumni
Independent visiting researcher - University of Oregon
PhD - McGill University
Independent visiting researcher - ETH Zurich
Collaborating researcher - Georgia Tech
Postdoctorate - McGill University
Postdoctorate - McGill University
Undergraduate - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - Université de Montréal
Principal supervisor :
Undergraduate - McGill University
Master's Research - McGill University
Collaborating Alumni
Independent visiting researcher
Postdoctorate - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Research Intern - University of Oslo
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
Independent visiting researcher - York University
PhD - McGill University

Publications

A Unified, Scalable Framework for Neural Population Decoding
Mehdi Azabou
Vinam Arora
Venkataramana Ganesh
Ximeng Mao
Santosh B Nachimuthu
Michael Jacob Mendelson
Eva L Dyer
Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both the model … (see more)size and the datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale for neural decoding models.
The neuroconnectionist research programme
Adrien C. Doerig
R. Sommers
Katja Seeliger
J. Ismael
Grace W. Lindsay
Konrad Paul Kording
Talia Konkle
M. Gerven
Nikolaus Kriegeskorte
Tim Kietzmann
Responses of pyramidal cell somata and apical dendrites in mouse visual cortex over multiple days
Colleen J Gillon
Jérôme A. Lecoq
Jason E. Pina
Ruweida Ahmed
Yazan N. Billeh
Shiella Caldejon
Peter Groblewski
Timothy M. Henley
India Kato
Eric Lee
Jennifer Luviano
Kyla Mace
Chelsea Nayan
Thuyanh V. Nguyen
Kat North
Jed Perkins
Sam Seid
Matthew T. Valley
Ali Williford
Timothy P. Lillicrap
Joel Zylberberg
The study of plasticity has always been about gradients
Konrad Paul Kording
Catalyzing next-generation Artificial Intelligence through NeuroAI
Anthony Zador
Sean Escola
Bence Ölveczky
Kwabena Boahen
Matthew Botvinick
Dmitri Chklovskii
Anne Churchland
Claudia Clopath
James DiCarlo
Surya
Surya Ganguli
Jeff Hawkins
Konrad Paul Kording
Alexei Koulakov
Yann LeCun
Timothy P. Lillicrap
Adam
Adam Marblestone … (see 9 more)
Bruno Olshausen
Alexandre Pouget
Cristina Savin
Terrence Sejnowski
Eero Simoncelli
Sara Solla
David Sussillo
Andreas S. Tolias
Doris Tsao
Transfer Entropy Bottleneck: Learning Sequence to Sequence Information Transfer
Damjan Kalajdzievski
Ximeng Mao
Pascal Fortier-Poisson
When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream… (see more)) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challenge when modelling such data is that it is easy for a neural network to rely on the greatest joint correlations within the target stream, which may ignore a crucial but small information transfer from the source to the target stream. As well, there are often situations where the target stream may have previously been modelled independently and it would be useful to use that model to inform a new joint model. Here, we develop an information bottleneck approach for conditional learning on two dependent streams of data. Our method, which we call Transfer Entropy Bottleneck (TEB), allows one to learn a model that bottlenecks the directed information transferred from the source variable to the target variable, while quantifying this information transfer within the model. As such, TEB provides a useful new information bottleneck approach for modelling two statistically dependent streams of data in order to make predictions about one of them.
How gradient estimator variance and bias impact learning in neural networks
Arna Ghosh
Yuhan Helena Liu
Konrad Paul Kording
There is growing interest in understanding how real brains may approximate gradients and how gradients can be used to train neuromorphic chi… (see more)ps. However, neither real brains nor neuromorphic chips can perfectly follow the loss gradient, so parameter updates would necessarily use gradient estimators that have some variance and/or bias. Therefore, there is a need to understand better how variance and bias in gradient estimators impact learning dependent on network and task properties. Here, we show that variance and bias can impair learning on the training data, but some degree of variance and bias in a gradient estimator can be beneficial for generalization. We find that the ideal amount of variance and bias in a gradient estimator are dependent on several properties of the network and task: the size and activity sparsity of the network, the norm of the gradient, and the curvature of the loss landscape. As such, whether considering biologically-plausible learning algorithms or algorithms for training neuromorphic chips, researchers can analyze these properties to determine whether their approximation to gradient descent will be effective for learning given their network and task properties.
Formalizing locality for normative synaptic plasticity models
Colin Bredenberg
Ezekiel Williams
Cristina Savin
H OW GRADIENT ESTIMATOR VARIANCE AND BIAS COULD IMPACT LEARNING IN NEURAL CIRCUITS
Arna Ghosh
Yuhan Helena Liu
Konrad K¨ording
There is growing interest in understanding how real brains may approximate gradients and how gradients can be used to train neuromorphic chi… (see more)ps. However, neither real brains nor neuromorphic chips can perfectly follow the loss gradient, so parameter updates would necessarily use gradient estimators that have some variance and/or bias. Therefore, there is a need to understand better how variance and bias in gradient estimators impact learning dependent on network and task properties. Here, we show that variance and bias can impair learning on the training data, but some degree of variance and bias in a gradient estimator can be beneficial for generalization. We find that the ideal amount of variance and bias in a gradient estimator are dependent on several properties of the network and task: the size and activity sparsity of the network, the norm of the gradient, and the curvature of the loss landscape. As such, whether considering biologically-plausible learning algorithms or algorithms for training neuromorphic chips, researchers can analyze these properties to determine whether their approximation to gradient descent will be effective for learning given their network and task properties.
Stimulus information guides the emergence of behavior related signals in primary somatosensory cortex during learning
Mariangela Panniello
Colleen J Gillon
Roberto Maffulli
Marco Celotto
Stefano Panzeri
Michael M Kohl
Adult neurogenesis acts as a neural regularizer
Lina M. Tran
Adam Santoro
Lulu Liu
Sheena A. Josselyn
Paul W. Frankland
A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions
Anthony GX-Chen
Veronica Chelu