Portrait de Blake Richards

Blake Richards

Membre académique principal
Chaire en IA Canada-CIFAR
McGill University, École d'informatique et Département de neurologie et de neurochirurgie
Google
Sujets de recherche
Apprentissage de représentations
Apprentissage par renforcement
Modèles génératifs
Neurosciences computationnelles

Biographie

Blake Richards est directeur de recherche au sein de l'équipe Paradigms of Intelligence chez Google et professeur agrégé à l'École d'informatique et au Département de neurologie et de neurochirurgie de l'Université McGill. Il est également et membre académique principal à Mila - Institut québécois d'intelligence artificielle.

Ses recherches se situent à l'intersection des neurosciences et de l'intelligence artificielle. Son laboratoire étudie les principes universels de l'intelligence qui s'appliquent aux agents naturels et artificiels.

Il a reçu plusieurs distinctions pour ses travaux, notamment une bourse Arthur-B.-McDonald du Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG) en 2022, le Prix du jeune chercheur de l'Association canadienne des neurosciences en 2019 et une chaire en IA Canada-CIFAR en 2018. M. Richards a en outre été titulaire d'une bourse postdoctorale Banting à l'hôpital SickKids de 2011 à 2013. Il a obtenu un doctorat en neurosciences de l'Université d'Oxford en 2010 et une licence en sciences cognitives et en IA de l'Université de Toronto en 2004.

Étudiants actuels

Collaborateur·rice de recherche - UdeM
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Baccalauréat - McGill
Doctorat - McGill
Postdoctorat - McGill
Co-superviseur⋅e :
Visiteur de recherche indépendant - UdeM
Collaborateur·rice alumni - McGill
Doctorat - McGill
Doctorat - McGill
Collaborateur·rice alumni - McGill
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - Université de Montréal
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Visiteur de recherche indépendant - NA
Collaborateur·rice alumni - McGill
Doctorat - McGill
Doctorat - McGill
Co-superviseur⋅e :
Visiteur de recherche indépendant - York University
Doctorat - Concordia
Superviseur⋅e principal⋅e :

Publications

Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RL
In real life, success is often contingent upon multiple critical steps that are distant in time from each other and from the final reward. T… (voir plus)hese critical steps are challenging to identify with traditional reinforcement learning (RL) methods that rely on the Bellman equation for credit assignment. Here, we present a new RL algorithm that uses offline contrastive learning to hone in on these critical steps. This algorithm, which we call Contrastive Retrospection (ConSpec), can be added to any existing RL algorithm. ConSpec learns a set of prototypes for the critical steps in a task by a novel contrastive loss and delivers an intrinsic reward when the current state matches one of the prototypes. The prototypes in ConSpec provide two key benefits for credit assignment: (i) They enable rapid identification of all the critical steps. (ii) They do so in a readily interpretable manner, enabling out-of-distribution generalization when sensory features are altered. Distinct from other contemporary RL approaches to credit assignment, ConSpec takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon (and ignoring other states) than it is to prospectively predict reward at every taken step. ConSpec greatly improves learning in a diverse set of RL tasks. The code is available at the link: https://github.com/sunchipsster1/ConSpec
Learning better with Dale's Law: A Spectral Perspective
Most recurrent neural networks (RNNs) do not include a fundamental constraint of real neural circuits: Dale’s Law, which implies that neur… (voir plus)ons must be excitatory (E) or inhibitory (I). Dale’s Law is generally absent from RNNs because simply partitioning a standard network’s units into E and I populations impairs learning. However, here we extend a recent feedforward bio-inspired EI network architecture, named Dale’s ANNs, to recurrent networks, and demonstrate that good performance is possible while respecting Dale’s Law. This begs the question: What makes some forms of EI network learn poorly and others learn well? And, why does the simple approach of incorporating Dale’s Law impair learning? Historically the answer was thought to be the sign constraints on EI network parameters, and this was a motivation behind Dale’s ANNs. However, here we show the spectral properties of the recurrent weight matrix at initialisation are more impactful on network performance than sign constraints. We find that simple EI partitioning results in a singular value distribution that is multimodal and dispersed, whereas standard RNNs have an unimodal, more clustered singular value distribution, as do recurrent Dale’s ANNs. We also show that the spectral properties and performance of partitioned EI networks are worse for small networks with fewer I units, and we present normalised SVD entropy as a measure of spectrum pathology that correlates with performance. Overall, this work sheds light on a long-standing mystery in neuroscience-inspired AI and computational neuroscience, paving the way for greater alignment between neural networks and biology.
A Unified, Scalable Framework for Neural Population Decoding
Mehdi Azabou
Vinam Arora
Venkataramana Ganesh
Santosh Nachimuthu
Michael J. Mendelson
Matthew G. Perich
Eva L. Dyer
Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size… (voir plus) and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.
Formalizing locality for normative synaptic plasticity models
Towards Scaling Difference Target Propagation by Learning Backprop Targets
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to… (voir plus) scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32
On Neural Architecture Inductive Biases for Relational Tasks
Current deep learning approaches have shown good in-distribution generalization performance, but struggle with out-of-distribution generaliz… (voir plus)ation. This is especially true in the case of tasks involving abstract relations like recognizing rules in sequences, as we find in many intelligence tests. Recent work has explored how forcing relational representations to remain distinct from sensory representations, as it seems to be the case in the brain, can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by 'partitioned' representations of relations and sensory details, and how this inductive bias can help recompose learned relational structure in newly encountered settings. We introduce a simple architecture based on similarity scores which we name Compositional Relational Network (CoRelNet). Using this model, we investigate a series of inductive biases that ensure abstract relations are learned and represented distinctly from sensory data, and explore their effects on out-of-distribution generalization for a series of relational psychophysics tasks. We find that simple architectural choices can outperform existing models in out-of-distribution generalization. Together, these results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing out-of-distribution relational computations.
Learning to Live with Dale's Principle: ANNs with Separate Excitatory and Inhibitory Units
Marco Leite
Amélie Lamarquette
Dimitri M. Kullmann
A bstract The units in artificial neural networks (ANNs) can be thought of as abstraction… (voir plus)s of biological neurons, and ANNs are increasingly used in neuroscience research. However, there are many important differences between ANN units and real neurons. One of the most notable is the absence of Dale’s principle, which ensures that biological neurons are either exclusively excitatory or inhibitory. Dale’s principle is typically left out of ANNs because its inclusion impairs learning. This is problematic, because one of the great advantages of ANNs for neuroscience research is their ability to learn complicated, realistic tasks. Here, by taking inspiration from feedforward inhibitory interneurons in the brain we show that we can develop ANNs with separate populations of excitatory and inhibitory units that learn just as well as standard ANNs. We call these networks Dale’s ANNs (DANNs). We present two insights that enable DANNs to learn well: (1) DANNs are related to normalization schemes, and can be initialized such that the inhibition centres and standardizes the excitatory activity, (2) updates to inhibitory neuron parameters should be scaled using corrections based on the Fisher Information matrix. These results demonstrate how ANNs that respect Dale’s principle can be built without sacrificing learning performance, which is important for future work using ANNs as models of the brain. The results may also have interesting implications for how inhibitory plasticity in the real brain operates.
Adversarial Feature Desensitization
Neural networks are known to be vulnerable to adversarial attacks -- slight but carefully constructed perturbations of the inputs which can … (voir plus)drastically impair the network's performance. Many defense methods have been proposed for improving robustness of deep networks by training them on adversarially perturbed inputs. However, these models often remain vulnerable to new types of attacks not seen during training, and even to slightly stronger versions of previously seen attacks. In this work, we propose a novel approach to adversarial robustness, which builds upon the insights from the domain adaptation field. Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs. This is achieved through a game where we learn features that are both predictive and robust (insensitive to adversarial attacks), i.e. cannot be used to discriminate between natural and adversarial data. Empirical results on several benchmarks demonstrate the effectiveness of the proposed approach against a wide range of attack types and attack strengths. Our code is available at https://github.com/BashivanLab/afd.
Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets
Marc-Andre Schulz
B. T. Thomas Yeo
Joshua T. Vogelstein
Janaina Mourao-Miranada
Jakob N. Kather
Konrad Kording
Recently, deep learning has unlocked unprecedented success in various domains, especially using images, text, and speech. However, deep lear… (voir plus)ning is only beneficial if the data have nonlinear relationships and if they are exploitable at available sample sizes. We systematically profiled the performance of deep, kernel, and linear models as a function of sample size on UKBiobank brain images against established machine learning references. On MNIST and Zalando Fashion, prediction accuracy consistently improves when escalating from linear models to shallow-nonlinear models, and further improves with deep-nonlinear models. In contrast, using structural or functional brain scans, simple linear models perform on par with more complex, highly parameterized models in age/sex prediction across increasing sample sizes. In sum, linear models keep improving as the sample size approaches ~10,000 subjects. Yet, nonlinearities for predicting common phenotypes from typical brain scans remain largely inaccessible to the examined kernel and deep learning methods.