Portrait of Blake Richards

Blake Richards

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science and Department of Neurology and Neurosurgery
Google
Research Topics
Computational Neuroscience
Generative Models
Reinforcement Learning
Representation Learning

Biography

Blake Richards is an associate professor at the School of Computer Science and in the Department of Neurology and Neurosurgery at McGill University, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Richards’ research lies at the intersection of neuroscience and AI. His laboratory investigates universal principles of intelligence that apply to both natural and artificial agents.

He has received several awards for his work, including the NSERC Arthur B. McDonald Fellowship in 2022, the Canadian Association for Neuroscience Young Investigator Award in 2019, and a Canada CIFAR AI Chair in 2018. Richards was a Banting Postdoctoral Fellow at SickKids Hospital from 2011 to 2013.

He obtained his PhD in neuroscience from the University of Oxford in 2010, and his BSc in cognitive science and AI from the University of Toronto in 2004.

Current Students

Independent visiting researcher - Seoul National University
Postdoctorate - McGill University
Postdoctorate - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Postdoctorate - McGill University
PhD - McGill University
Independent visiting researcher - Seoul National University
PhD - McGill University
Research Intern - McGill University
Collaborating Alumni
PhD - McGill University
Independent visiting researcher - ETH Zurich
Collaborating researcher - Georgia Tech
Postdoctorate - McGill University
Undergraduate - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - Université de Montréal
Principal supervisor :
Undergraduate - McGill University
Master's Research - McGill University
Independent visiting researcher
Postdoctorate - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
Independent visiting researcher - Seoul National University
Independent visiting researcher - York University
PhD - McGill University
PhD - Concordia University
Principal supervisor :

Publications

Brain-like learning with exponentiated gradients
Jonathan Cornford
Roman Pogodin
Arna Ghosh
Kaiwen Sheng
Brendan A. Bicknell
Olivier Codol
Beverley A. Clark
Multi-agent cooperation through learning-aware policy gradients
Alexander Meulemans
Seijin Kobayashi
Johannes von Oswald
Nino Scherrer
Eric Elmoznino
Blaise Agüera y Arcas
João Sacramento
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning. How can we achieve cooperation… (see more) among self-interested, independent learning agents? Promising recent work has shown that in certain tasks cooperation can be established between learning-aware agents who model the learning dynamics of each other. Here, we present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning, which takes into account that other agents are themselves learning through trial and error based on multiple noisy trials. We then leverage efficient sequence models to condition behavior on long observation histories that contain traces of the learning dynamics of other agents. Training long-context policies with our algorithm leads to cooperative behavior and high returns on standard social dilemmas, including a challenging environment where temporally-extended action coordination is required. Finally, we derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
Top-down feedback matters: Functional impact of brainlike connectivity motifs on audiovisual integration
Mashbayar Tugsbayar
Mingze Li
The oneirogen hypothesis: modeling the hallucinatory effects of classical psychedelics in terms of replay-dependent plasticity mechanisms
Colin Bredenberg
Fabrice Normandin
Harnessing small projectors and multiple views for efficient vision pretraining
Arna Ghosh
Kumar Krishna Agrawal
Shagun Sodhani
Learning Successor Features the Simple Way
Raymond Chua
Arna Ghosh
Christos Kaplanis
In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference … (see more)in non-stationary environments. Successor Features (SFs) offer a potential solution to this challenge. However, canonical techniques for learning SFs from pixel-level observations often lead to representation collapse, wherein representations degenerate and fail to capture meaningful variations in the data. More recent methods for learning SFs can avoid representation collapse, but they often involve complex losses and multiple learning phases, reducing their efficiency. We introduce a novel, simple method for learning SFs directly from pixels. Our approach uses a combination of a Temporal-difference (TD) loss and a reward prediction loss, which together capture the basic mathematical definition of SFs. We show that our approach matches or outperforms existing SF learning techniques in both 2D (Minigrid), 3D (Miniworld) mazes and Mujoco, for both single and continual learning scenarios. As well, our technique is efficient, and can reach higher levels of performance in less time than other approaches. Our work provides a new, streamlined technique for learning SFs directly from pixel observations, with no pretraining required.
Towards a "Universal Translator" for Neural Dynamics at Single-Cell, Single-Spike Resolution
Yizi Zhang
Yanchen Wang
Donato M. Jiménez-Benetó
Zixuan Wang
Mehdi Azabou
Renee Tung
Olivier Winter
International Brain Laboratory
Eva L Dyer
Liam Paninski
Cole Lincoln Hurwitz
Stochastic Wiring of Cell Types Enhances Fitness by Generating Phenotypic Variability
Divyansha Lachi
Ann Huang
Augustine N. Mavor-Parker
Arna Ghosh
Anthony Zador
The development of neural connectivity is a crucial biological process that gives rise to diverse brain circuits and behaviors. Neural devel… (see more)opment is a stochastic process, but this stochasticity is often treated as a nuisance to overcome rather than as a functional advantage. Here we use a computational model, in which connection probabilities between discrete cell types are genetically specified, to investigate the benefits of stochasticity in the development of neural wiring. We show that this model can be viewed as a generalization of a powerful class of artificial neural networks—Bayesian neural networks—where each network parameter is a sample from a distribution. Our results reveal that stochasticity confers a greater benefit in large networks and variable environments, which may explain its role in organisms with larger brains. Surprisingly, we find that the average fitness over a population of agents is higher than a single agent defined by the average connection probability. Our model reveals how developmental stochasticity, by inducing a form of non-heritable phenotypic variability, can increase the probability that at least some individuals will survive in rapidly changing, unpredictable environments. Our results suggest how stochasticity may be an important feature rather than a bug in neural development.
Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent
Karolis Jucys
George Adamopoulos
Mehrab Hamidi
Stephanie Milani
Mohammad Reza Samsami
Artem Zholus
Sonia Joseph
Özgür Şimşek
Understanding the mechanisms behind decisions taken by large foundation models in sequential tasks is critical to ensuring that such systems… (see more) operate transparently and safely. However, interpretability methods have not yet been applied extensively to large-scale agents based on reinforcement learning. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We try to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task --- crafting a diamond pickaxe. The agent seems to pay attention to the 4 last frames and several key-frames further back. This provides clues as to how it maintains coherence in the task that takes 3-10 minutes, despite the agent's short memory span of only six seconds. Second, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk and punches it to death, when positioned stationary under green tree leaves. We demonstrate similar misbehavior in a related agent (STEVE-1), which motivates the use of VPT as a model organism for large-scale vision-based agent interpretability.
Stimulus information guides the emergence of behavior-related signals in primary somatosensory cortex during learning.
Mariangela Panniello
Colleen J Gillon
Roberto Maffulli
Marco Celotto
Stefano Panzeri
Michael M Kohl
Sequential predictive learning is a unifying theory for hippocampal representation and replay
Daniel Levenstein
Aleksei Efremov
Roy Henha Eyono
Adrien Peyrache
The mammalian hippocampus contains a cognitive map that represents an animal’s position in the environment 1 and generates offline “repl… (see more)ay” 2,3 for the purposes of recall 4, planning 5,6, and forming long term memories 7. Recently, it’s been found that artificial neural networks trained to predict sensory inputs develop spatially tuned cells 8, aligning with predictive theories of hippocampal function 9–11. However, whether predictive learning can also account for the ability to produce offline replay is unknown. Here, we find that spatially tuned cells, which robustly emerge from all forms of predictive learning, do not guarantee the presence of a cognitive map with the ability to generate replay. Offline simulations only emerged in networks that used recurrent connections and head-direction information to predict multi-step observation sequences, which promoted the formation of a continuous attractor reflecting the geometry of the environment. These offline trajectories were able to show wake-like statistics, autonomously replay recently experienced locations, and could be directed by a virtual head direction signal. Further, we found that networks trained to make cyclical predictions of future observation sequences were able to rapidly learn a cognitive map and produced sweeping representations of future positions reminiscent of hippocampal theta sweeps 12. These results demonstrate how hippocampal-like representation and replay can emerge in neural networks engaged in predictive learning, and suggest that hippocampal theta sequences reflect a circuit that implements a data-efficient algorithm for sequential predictive learning. Together, this framework provides a unifying theory for hippocampal functions and hippocampal-inspired approaches to artificial intelligence.
Fast burst fraction transients convey information independent of the firing rate
Richard Naud
Xingyun Wang
Zachary Friedenberger
Alexandre Payeur
Jiyun N. Shin
Jean-Claude Béïque
Moritz Drüke
Matthew E. Larkum
Guy Doron
Theories of attention and learning have hypothesized a central role for high-frequency bursting in cognitive functions, but experimental rep… (see more)orts of burst-mediated representations in vivo have been limited. Here we used a novel demultiplexing approach by considering a conjunctive burst code. We studied this code in vivo while animals learned to report direct electrical stimulation of the somatosensory cortex and found two acquired yet independent representations. One code, the event rate, showed a sparse and succint stiumulus representation and a small modulation upon detection errors. The other code, the burst fraction, correlated more globally with stimulation and more promptly responded to detection errors. Bursting modulation was potent and its time course evolved, even in cells that were considered unresponsive based on the firing rate. During the later stages of training, this modulation in bursting happened earlier, gradually aligning temporally with the representation in event rate. The alignment of bursting and event rate modulation sharpened the firing rate response, and was strongly associated behavioral accuracy. Thus a fine-grained separation of spike timing patterns reveals two signals that accompany stimulus representations: an error signal that can be essential to guide learning and a sharpening signal that could implement attention mechanisms.