Portrait of Blake Richards

Blake Richards

Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science and Department of Neurology and Neurosurgery
Google
Research Topics
Computational Neuroscience
Generative Models
Reinforcement Learning
Representation Learning

Biography

Blake Richards is Research Scientist Manager with the Paradigms of Intelligence team at Google, and an Associate Professor in the School of Computer Science and Department of Neurology and Neurosurgery at McGill University. He is also a Core Faculty Member at Mila.

Richards’ research lies at the intersection of neuroscience and AI. His laboratory investigates universal principles of intelligence that apply to both natural and artificial agents.

He has received several awards for his work, including the NSERC Arthur B. McDonald Fellowship in 2022, the Canadian Association for Neuroscience Young Investigator Award in 2019, and a Canada CIFAR AI Chair in 2018. Richards was a Banting Postdoctoral Fellow at SickKids Hospital from 2011 to 2013.

He obtained his PhD in neuroscience from the University of Oxford in 2010, and his BSc in cognitive science and AI from the University of Toronto in 2004.

Current Students

Postdoctorate - McGill University
Postdoctorate - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Independent visiting researcher - NYU
PhD - McGill University
Principal supervisor :
PhD - McGill University
Collaborating Alumni - McGill University
Undergraduate - McGill University
PhD - McGill University
Postdoctorate - McGill University
Co-supervisor :
Independent visiting researcher - Université de Montréal
Collaborating Alumni - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
Postdoctorate - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - Université de Montréal
Principal supervisor :
Collaborating Alumni - McGill University
Independent visiting researcher - Université de Montréal
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Independent visiting researcher - NA
Master's Research - McGill University
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
Independent visiting researcher - York University
PhD - Concordia University
Principal supervisor :

Publications

Fast burst fraction transients convey information independent of the firing rate
Richard Naud
Xingyun Wang
Zachary Friedenberger
Jiyun N. Shin
Jean-Claude Béïque
Moritz Drüke
Matthew E. Larkum
Guy Doron
Theories of attention and learning have hypothesized a central role for high-frequency bursting in cognitive functions, but experimental rep… (see more)orts of burst-mediated representations in vivo have been limited. Here we used a novel demultiplexing approach by considering a conjunctive burst code. We studied this code in vivo while animals learned to report direct electrical stimulation of the somatosensory cortex and found two acquired yet independent representations. One code, the event rate, showed a sparse and succint stiumulus representation and a small modulation upon detection errors. The other code, the burst fraction, correlated more globally with stimulation and more promptly responded to detection errors. Bursting modulation was potent and its time course evolved, even in cells that were considered unresponsive based on the firing rate. During the later stages of training, this modulation in bursting happened earlier, gradually aligning temporally with the representation in event rate. The alignment of bursting and event rate modulation sharpened the firing rate response, and was strongly associated behavioral accuracy. Thus a fine-grained separation of spike timing patterns reveals two signals that accompany stimulus representations: an error signal that can be essential to guide learning and a sharpening signal that could implement attention mechanisms.
Sufficient conditions for offline reactivation in recurrent neural networks
During periods of quiescence, such as sleep, neural activity in many brain circuits resembles that observed during periods of task engagemen… (see more)t. However, the precise conditions under which task-optimized networks can autonomously reactivate the same network states responsible for online behavior is poorly understood. In this study, we develop a mathematical framework that outlines sufficient conditions for the emergence of neural reactivation in circuits that encode features of smoothly varying stimuli. We demonstrate mathematically that noisy recurrent networks optimized to track environmental state variables using change-based sensory information naturally develop denoising dynamics, which, in the absence of input, cause the network to revisit state configurations observed during periods of online activity. We validate our findings using numerical experiments on two canonical neuroscience tasks: spatial position estimation based on self-motion cues, and head direction estimation based on angular velocity cues. Overall, our work provides theoretical support for modeling offline reactivation as an emergent consequence of task optimization in noisy neural circuits.
Synaptic Weight Distributions Depend on the Geometry of Plasticity
A growing literature in computational neuroscience leverages gradient descent and learning algorithms that approximate it to study synaptic … (see more)plasticity in the brain. However, the vast majority of this work ignores a critical underlying assumption: the choice of distance for synaptic changes - i.e. the geometry of synaptic plasticity. Gradient descent assumes that the distance is Euclidean, but many other distances are possible, and there is no reason that biology necessarily uses Euclidean geometry. Here, using the theoretical tools provided by mirror descent, we show that the distribution of synaptic weights will depend on the geometry of synaptic plasticity. We use these results to show that experimentally-observed log-normal weight distributions found in several brain areas are not consistent with standard gradient descent (i.e. a Euclidean geometry), but rather with non-Euclidean distances. Finally, we show that it should be possible to experimentally test for different synaptic geometries by comparing synaptic weight distributions before and after learning. Overall, our work shows that the current paradigm in theoretical work on synaptic plasticity that assumes Euclidean synaptic geometry may be misguided and that it should be possible to experimentally determine the true geometry of synaptic plasticity in the brain.
Addressing Sample Inefficiency in Multi-View Representation Learning
Sufficient conditions for offline reactivation in recurrent neural networks
During periods of quiescence, such as sleep, neural activity in many brain circuits resembles that observed during periods of task engagemen… (see more)t. However, the precise conditions under which task-optimized networks can autonomously reactivate the same network states responsible for online behavior are poorly understood. In this study, we develop a mathematical framework that outlines sufficient conditions for the emergence of neural reactivation in circuits that encode features of smoothly varying stimuli. We demonstrate mathematically that noisy recurrent networks optimized to track environmental state variables using change-based sensory information naturally develop denoising dynamics, which, in the absence of input, cause the network to revisit state configurations observed during periods of online activity. We validate our findings using numerical experiments on two canonical neuroscience tasks: spatial position estimation based on self-motion cues, and head direction estimation based on angular velocity cues. Overall, our work provides theoretical support for modeling offline reactivation as an emergent consequence of task optimization in noisy neural circuits.
Harnessing small projectors and multiple views for efficient vision pretraining
Recent progress in self-supervised (SSL) visual representation learning has led to the development of several different proposed frameworks … (see more)that rely on augmentations of images but use different loss functions. However, there are few theoretically grounded principles to guide practice, so practical implementation of each SSL framework requires several heuristics to achieve competitive performance. In this work, we build on recent analytical results to design practical recommendations for competitive and efficient SSL that are grounded in theory. Specifically, recent theory tells us that existing SSL frameworks are minimizing the same idealized loss, which is to learn features that best match the data similarity kernel defined by the augmentations used. We show how this idealized loss can be reformulated to a functionally equivalent loss that is more efficient to compute. We study the implicit bias of using gradient descent to minimize our reformulated loss function and find that using a stronger orthogonalization constraint with a reduced projector dimensionality should yield good representations. Furthermore, the theory tells us that approximating the reformulated loss should be improved by increasing the number of augmentations, and as such using multiple augmentations should lead to improved convergence. We empirically verify our findings on CIFAR, STL and Imagenet datasets, wherein we demonstrate an improved linear readout performance when training a ResNet-backbone using our theoretically grounded recommendations. Remarkably, we also demonstrate that by leveraging these insights, we can reduce the pretraining dataset size by up to 2
Temporal encoding in deep reinforcement learning agents
Ann Zixiang Huang
Temporal encoding in deep reinforcement learning agents
Ann Zixiang Huang
On the Information Geometry of Vision Transformers
On the Varied Faces of Overparameterization in Supervised and Self-Supervised Learning
Matteo Gamba
Agrawal
Hossein Azizpour
Mårten Björkman
The quality of the representations learned by neural networks depends on several factors, including the loss function, learning algorithm, a… (see more)nd model architecture. In this work, we use information geometric measures to assess the representation quality in a principled manner. We demonstrate that the sensitivity of learned representations to input perturbations, measured by the spectral norm of the feature Jacobian, provides valuable information about downstream generalization. On the other hand, measuring the coefficient of spectral decay observed in the eigenspectrum of feature covariance provides insights into the global representation geometry. First, we empirically establish an equivalence between these notions of representation quality and show that they are inversely correlated. Second, our analysis reveals the varying roles that overparameterization plays in improving generalization. Unlike supervised learning, we observe that increasing model width leads to higher discriminability and less smoothness in the self-supervised regime. Furthermore, we report that there is no observable double descent phenomenon in SSL with non-contrastive objectives for commonly used parameterization regimes, which opens up new opportunities for tight asymptotic analysis. Taken together, our results provide a loss-aware characterization of the different role of overparameterization in supervised and self-supervised learning.
Learning from unexpected events in the neocortical microcircuit
Colleen J Gillon
Jason E. Pina
Jérôme A. Lecoq
Ruweida Ahmed
Yazan N. Billeh
Shiella Caldejon
Peter Groblewski
Timothy M. Henley
India Kato
Eric Lee
Jennifer Luviano
Kyla Mace
Chelsea Nayan
Thuyanh V. Nguyen
Kat North
Jed Perkins
Sam Seid
Matthew T. Valley
Ali Williford
Timothy P. Lillicrap
Responses to Pattern-Violating Visual Stimuli Evolve Differently Over Days in Somata and Distal Apical Dendrites
Colleen J Gillon
Jason E. Pina
Jérôme A. Lecoq
Ruweida Ahmed
Yazan N. Billeh
Shiella Caldejon
Peter Groblewski
Timothy M. Henley
India Kato
Eric Lee
Jennifer Luviano
Kyla Mace
Chelsea Nayan
Thuyanh V. Nguyen
Kat North
Jed Perkins
Sam Seid
Matthew T. Valley
Ali Williford
Timothy P. Lillicrap
Scientists have long conjectured that the neocortex learns patterns in sensory data to generate top-down predictions of upcoming stimuli. In… (see more) line with this conjecture, different responses to pattern-matching vs pattern-violating visual stimuli have been observed in both spiking and somatic calcium imaging data. However, it remains unknown whether these pattern-violation signals are different between the distal apical dendrites, which are heavily targeted by top-down signals, and the somata, where bottom-up information is primarily integrated. Furthermore, it is unknown how responses to pattern-violating stimuli evolve over time as an animal gains more experience with them. Here, we address these unanswered questions by analyzing responses of individual somata and dendritic branches of layer 2/3 and layer 5 pyramidal neurons tracked over multiple days in primary visual cortex of awake, behaving female and male mice. We use sequences of Gabor patches with patterns in their orientations to create pattern-matching and pattern-violating stimuli, and two-photon calcium imaging to record neuronal responses. Many neurons in both layers show large differences between their responses to pattern-matching and pattern-violating stimuli. Interestingly, these responses evolve in opposite directions in the somata and distal apical dendrites, with somata becoming less sensitive to pattern-violating stimuli and distal apical dendrites more sensitive. These differences between the somata and distal apical dendrites may be important for hierarchical computation of sensory predictions and learning, since these two compartments tend to receive bottom-up and top-down information, respectively.