Portrait of Blake Richards

Blake Richards

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science and Department of Neurology and Neurosurgery
Google
Research Topics
Computational Neuroscience
Generative Models
Reinforcement Learning
Representation Learning

Biography

Blake Richards is Research Scientist Manager with the Paradigms of Intelligence team at Google, and an Associate Professor in the School of Computer Science and Department of Neurology and Neurosurgery at McGill University. He is also a Core Faculty Member at Mila.

Richards’ research lies at the intersection of neuroscience and AI. His laboratory investigates universal principles of intelligence that apply to both natural and artificial agents.

He has received several awards for his work, including the NSERC Arthur B. McDonald Fellowship in 2022, the Canadian Association for Neuroscience Young Investigator Award in 2019, and a Canada CIFAR AI Chair in 2018. Richards was a Banting Postdoctoral Fellow at SickKids Hospital from 2011 to 2013.

He obtained his PhD in neuroscience from the University of Oxford in 2010, and his BSc in cognitive science and AI from the University of Toronto in 2004.

Current Students

Postdoctorate - McGill University
Postdoctorate - Université de Montréal
Principal supervisor :
PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Collaborating Alumni - McGill University
Undergraduate - McGill University
PhD - McGill University
Postdoctorate - McGill University
Co-supervisor :
Independent visiting researcher - Université de Montréal
Collaborating Alumni - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
Collaborating Alumni - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
PhD - Université de Montréal
Principal supervisor :
Master's Research - McGill University
Principal supervisor :
Independent visiting researcher - Université de Montréal
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Independent visiting researcher - NA
Collaborating Alumni - McGill University
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
Independent visiting researcher - York University
PhD - Concordia University
Principal supervisor :

Publications

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Seijin Kobayashi
Yanick Schimpf
Maximilian Schlegel
Angelika Steger
Maciej Wolczyk
Johannes Von Oswald
Kaitlin Maile
Rif A. Saurous
James Manyika
Blaise Aguera y Arcas
Alexander Meulemans
João Sacramento
Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecede… (see more)nted success on many problem domains. During RL, these models explore by generating new outputs, one token at a time. However, sampling actions token-by-token can result in highly inefficient learning, particularly when rewards are sparse. Here, we show that it is possible to overcome this problem by acting and exploring within the internal representations of an autoregressive model. Specifically, to discover temporally-abstract actions, we introduce a higher-order, non-causal sequence model whose outputs control the residual stream activations of a base autoregressive model. On grid world and MuJoCo-based tasks with hierarchical structure, we find that the higher-order model learns to compress long activation sequence chunks onto internal controllers. Critically, each controller executes a sequence of behaviorally meaningful actions that unfold over long timescales and are accompanied with a learned termination condition, such that composing multiple controllers over time leads to efficient exploration on novel tasks. We show that direct internal controller reinforcement, a process we term "internal RL", enables learning from sparse rewards in cases where standard RL finetuning fails. Our results demonstrate the benefits of latent action generation and reinforcement in autoregressive models, suggesting internal RL as a promising avenue for realizing hierarchical RL within foundation models.
Embedded Universal Predictive Intelligence: a coherent framework for multi-agent learning
Alexander Meulemans
Rajai Nasser
Maciej Wolczyk
Marissa A. Weis
Seijin Kobayashi
Angelika Steger
Marcus Hutter
James Manyika
Rif A. Saurous
João Sacramento
Blaise Aguera y Arcas
Know Thyself by Knowing Others: Learning Neuron Identity from Population Context
Vinam Arora
Ian J. Knight
Mehdi Azabou
Cole Hurwitz
Joshua H. Siegle
Eva L. Dyer
Identifying the functional identity of individual neurons is essential for interpreting circuit dynamics, yet it remains a major challenge i… (see more)n large-scale _in vivo_ recordings where anatomical and molecular labels are often unavailable. Here we introduce NuCLR, a self-supervised framework that learns context-aware representations of neuron identity by modeling each neuron's role within the broader population. NuCLR employs a spatio-temporal transformer that captures both within-neuron dynamics and across-neuron interactions. It is trained with a sample-wise contrastive objective that encourages temporally-stable and discriminative embeddings. Across multiple open-access datasets, NuCLR outperforms prior methods in both cell type and brain region classification. Critically, it exhibits strong zero-shot generalization to entirely new populations, without any retraining or access to stimulus labels. Furthermore, we demonstrate that our framework scales effectively with data size. Overall, our results demonstrate that modeling population context is crucial for understanding neuron identity and that rich signal for cell-typing and neuron localization is present in neural activity alone.Code available at: https://github.com/nerdslab/nuclr.
Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
Biological and artificial intelligence systems navigate the fundamental efficiency-robustness tradeoff for optimal encoding, i.e., they must… (see more) efficiently encode numerous attributes of the input space while also being robust to noise. This challenge is particularly evident in hierarchical processing systems like the human brain. With a view towards understanding how systems navigate the efficiency-robustness tradeoff, we turned to a population geometry framework for analyzing representations in the human visual cortex alongside artificial neural networks (ANNs). In the ventral visual stream, we found general-purpose, scale-free representations characterized by a power law-decaying eigenspectrum in most areas. However, in certain higher-order visual areas did not have scale-free representations, indicating that scale-free geometry is not a universal property of the brain. In parallel, ANNs trained with a self-supervised learning objective also exhibited free-free geometry, but not after fine-tune on a specific task. Based on these empirical results and our analytical insights, we posit that a system's representation geometry is not a universal property and instead depends upon the computational objective.
Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?
Anthony GX-Chen
Rob Fergus
Kenneth Marino
Language model (LM) agents are increasingly used as autonomous decision-makers who need to actively gather information to guide their decisi… (see more)ons. A crucial cognitive skill for such agents is the efficient exploration and understanding of the causal structure of the world -- key to robust, scientifically grounded reasoning. Yet, it remains unclear whether LMs possess this capability or exhibit systematic biases leading to erroneous conclusions. In this work, we examine LMs' ability to explore and infer causal relationships, using the well-established"Blicket Test"paradigm from developmental psychology. We find that LMs reliably infer the common, intuitive disjunctive causal relationships but systematically struggle with the unusual, yet equally (or sometimes even more) evidenced conjunctive ones. This"disjunctive bias"persists across model families, sizes, and prompting strategies, and performance further declines as task complexity increases. Interestingly, an analogous bias appears in human adults, suggesting that LMs may have inherited deep-seated reasoning heuristics from their training data. To this end, we quantify similarities between LMs and humans, finding that LMs exhibit adult-like inference profiles (but not children-like). Finally, we propose a test-time sampling method which explicitly samples and eliminates hypotheses about causal relationships from the LM. This scalable approach significantly reduces the disjunctive bias and moves LMs closer to the goal of scientific, causally rigorous reasoning.
Learning to combine top-down context and feed-forward representations under ambiguity with apical and basal dendrites
Guillaume Etter
Busra Tugce Gurbuz
The challenge of hidden gifts in multi-agent reinforcement learning
Cooperation between people is not always obvious. Sometimes we benefit from actions that others have taken even when we are unaware that the… (see more)y took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These “hidden gifts” represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit to your own actions correctly when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a “hidden gift”. We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show how credit assignment in multi-agent settings can be particularly challenging in the presence of “hidden gifts”, and demonstrate that learning awareness can benefit these settings
The challenge of hidden gifts in multi-agent reinforcement learning
Cooperation between people is not always obvious. Sometimes we benefit from actions that others have taken even when we are unaware that the… (see more)y took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These “hidden gifts” represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit to your own actions correctly when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a “hidden gift”. We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show how credit assignment in multi-agent settings can be particularly challenging in the presence of “hidden gifts”, and demonstrate that learning awareness can benefit these settings
Tracing the representation geometry of language models from pretraining to post-training
The geometry of representations in a neural network can significantly impact downstream generalization. It is unknown how representation geo… (see more)metry changes in large language models (LLMs) over pretraining and post-training. Here, we characterize the evolving geometry of LLM representations using spectral methods (effective rank and eigenspectrum decay). With the OLMo and Pythia model families we uncover a consistent non-monotonic sequence of three distinct geometric phases in pretraining. An initial \warmup phase sees rapid representational compression. This is followed by an "entropy-seeking" phase, characterized by expansion of the representation manifold's effective dimensionality, which correlates with an increase in memorization. Subsequently, a "compression seeking" phase imposes anisotropic consolidation, selectively preserving variance along dominant eigendirections while contracting others, correlating with improved downstream task performance. We link the emergence of these phases to the fundamental interplay of cross-entropy optimization, information bottleneck, and skewed data distribution. Additionally, we find that in post-training the representation geometry is further transformed: Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) correlate with another "entropy-seeking" dynamic to integrate specific instructional or preferential data, reducing out-of-distribution robustness. Conversely, Reinforcement Learning with Verifiable Rewards (RLVR) often exhibits a "compression seeking" dynamic, consolidating reward-aligned behaviors and reducing the entropy in its output distribution. This work establishes the utility of spectral measures of representation geometry for understanding the multiphase learning dynamics within LLMs.
Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?
Anthony GX-Chen
Rob Fergus
Kenneth Marino
Steering CLIP's vision transformer with sparse autoencoders
Ethan Goldfarb
Lorenz Hufe
Yossi Gandelsman
Robert Graham
Wojciech Samek
While vision models are highly capable, their internal mechanisms remain poorly understood-- a challenge which sparse autoencoders (SAEs) ha… (see more)ve helped address in language, but which remains underexplored in vision. We address this gap by training SAEs on CLIP's vision transformer and uncover key differences between vision and language processing, including distinct sparsity patterns for SAEs trained across layers and token types. We then provide the first systematic analysis of the steerability of CLIP's vision transformer by introducing metrics to quantify how precisely SAE features can be steered to affect the model's output. We find that 10-15% of neurons and features are steerable, with SAEs providing thousands more steerable features than the base model. Through targeted suppression of SAE features, we then demonstrate improved performance on three vision disentanglement tasks (CelebA, Waterbirds, and typographic attacks), finding optimal disentanglement in middle model layers, and achieving state-of-the-art performance on defense against typographic attacks. We release our CLIP SAE models and code to support future research in vision transformer interpretability.