Blake Richards

Biography

Blake Richards is an associate professor at the School of Computer Science and in the Department of Neurology and Neurosurgery at McGill University, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Richards’ research lies at the intersection of neuroscience and AI. His laboratory investigates universal principles of intelligence that apply to both natural and artificial agents.

He has received several awards for his work, including the NSERC Arthur B. McDonald Fellowship in 2022, the Canadian Association for Neuroscience Young Investigator Award in 2019, and a Canada CIFAR AI Chair in 2018. Richards was a Banting Postdoctoral Fellow at SickKids Hospital from 2011 to 2013.

He obtained his PhD in neuroscience from the University of Oxford in 2010, and his BSc in cognitive science and AI from the University of Toronto in 2004.

Current Students

Aliya Affdal

Research Intern - Université de Montréal

Benjamin Alsbury-Nealy

PhD - McGill University

benjamin@silicolabs.ca

Antoine Boudreau LeBlanc

Postdoctorate - McGill University

Colin Bredenberg

Postdoctorate - Université de Montréal

Principal supervisor :

Ethan Caballero

PhD - McGill University

Co-supervisor :

PhD - McGill University

aleksei.efremov@mail.mcgill.ca

Raymond Chua

PhD - McGill University

Principal supervisor :

PhD - McGill University

Jonathan Cornford

Postdoctorate - McGill University

Alex Efremov

PhD - McGill University

Arna Ghosh

PhD - McGill University

Adel Halawa

PhD - McGill University

Danny Han

Independent visiting researcher - Seoul National University

Roy Henha Eyono

PhD - McGill University

Ann Huang

Collaborating Alumni

Sonia Joseph

PhD - McGill University

Clara Kümpel

Independent visiting researcher - ETH Zurich

Divyansha Lachi

Collaborating researcher - Georgia Tech

Postdoctorate - McGill University

zixuan.li3@mail.mcgill.ca

Zixuan Li Li

Undergraduate - McGill University

Dongyan Lin

PhD - McGill University

Master's Research - McGill University

Abdel Mfougouon Njupoun

PhD - Université de Montréal

Principal supervisor :

Collaborating Alumni - McGill University

Adrien Peyrache

Independent visiting researcher

Roman Pogodin

Collaborating Alumni - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

Irina Rish

Ali Saheb Pasand

PhD - McGill University

Co-supervisor :

Pablo Samuel Castro

Mandana Samiei

PhD - McGill University

Principal supervisor :

Doina Precup

samiemandana@gmail.com

Aidan Sirbu

Master's Research - McGill University

Co-supervisor :

Shahab Bakhtiari

Hiro Tanabe

Independent visiting researcher - NA

Josh Tindall

Master's Research - McGill University

Mashbayar Tugsbayar

PhD - McGill University

Charlotte Volk

Master's Research - McGill University

Co-supervisor :

Shahab Bakhtiari

Hee-Hwan Wang

Independent visiting researcher - Seoul National University

Maren Wehrheim

Independent visiting researcher - York University

Mohammad Yaghoubi

PhD - McGill University

Machine Learning for the Segmentation of Different Nerve Fibre Activations from Brain-to-body Neural Signals

AmirHossein Zamani

PhD - Concordia University

Principal supervisor :

Blog Posts

Représentation graphique d'un nerf vague

May 21, 2025

Param Raval

Olivier Tessier-Larivière

Pascal Fortier-Poisson

Blake Richards

Guillaume Lajoie

Read the article

June 13, 2024

What Do Synaptic Weight Distributions Tell Us About Learning in the Brain ?

Roman Pogodin

Jonathan Cornford

Arna Ghosh

Gauthier Gidel

Guillaume Lajoie

Blake Richards

Read the article

August 29, 2023

α-ReQ: Assessing Representation Quality in SSL

KK Agrawal

Arnab Kumar-Mondal

Arna Ghosh

Blake A. Richards

Read the article

Publications

Learning better with Dale’s Law: A Spectral Perspective

Pingsheng Li

Jonathan Cornford

Arna Ghosh

A Unified, Scalable Framework for Neural Population Decoding

Mehdi Azabou

Vinam Arora

Venkataramana Ganesh

Ximeng Mao

Santosh B Nachimuthu

Michael Jacob Mendelson

Matt Perich

Eva L Dyer

Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both the model … (see more)size and the datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale for neural decoding models.

The neuroconnectionist research programme

Adrien C. Doerig

R. Sommers

Katja Seeliger

J. Ismael

Grace W. Lindsay

Konrad Paul Kording

Talia Konkle

M. Gerven

Nikolaus Kriegeskorte

Tim Kietzmann

2023-05-30

Nature Reviews Neuroscience (published)

arxiv.org

Responses of pyramidal cell somata and apical dendrites in mouse visual cortex over multiple days

Colleen J Gillon

Jérôme A. Lecoq

Jason E. Pina

Ruweida Ahmed

Yazan N. Billeh

Shiella Caldejon

Peter Groblewski

Timothy M. Henley

India Kato

Eric Lee

Jennifer Luviano

Kyla Mace

Chelsea Nayan

Thuyanh V. Nguyen

Kat North

Jed Perkins

Sam Seid

Matthew T. Valley

Ali Williford

Yoshua Bengio … (see 3 more)

Timothy P. Lillicrap

Joel Zylberberg

2023-05-17

Scientific Data (published)

Responses of pyramidal cell somata and apical dendrites in mouse visual cortex over multiple days

Colleen J Gillon

Jérôme A. Lecoq

Jason E. Pina

Ruweida Ahmed

Yazan N. Billeh

Shiella Caldejon

Peter Groblewski

Timothy M. Henley

India Kato

Eric Lee

Jennifer Luviano

Kyla Mace

Chelsea Nayan

Thuyanh V. Nguyen

Kat North

Jed Perkins

Sam Seid

Matthew T. Valley

Ali Williford

Yoshua Bengio … (see 3 more)

Timothy P. Lillicrap

Joel Zylberberg

2023-05-17

Scientific Data (published)

The study of plasticity has always been about gradients

Konrad Paul Kording

2023-05-01

The Journal of Physiology (published)

Catalyzing next-generation Artificial Intelligence through NeuroAI

Anthony Zador

Sean Escola

Bence Ölveczky

Yoshua Bengio

Kwabena Boahen

Matthew Botvinick

Dmitri Chklovskii

Anne Churchland

Claudia Clopath

James DiCarlo

Surya

Surya Ganguli

Jeff Hawkins

Konrad Paul Kording

Alexei Koulakov

Yann LeCun

Timothy P. Lillicrap

Adam

Adam Marblestone … (see 9 more)

Bruno Olshausen

Alexandre Pouget

Cristina Savin

Terrence Sejnowski

Eero Simoncelli

Sara Solla

David Sussillo

Andreas S. Tolias

Doris Tsao

2023-03-22

Nature Communications (published)

Transfer Entropy Bottleneck: Learning Sequence to Sequence Information Transfer

Damjan Kalajdzievski

Ximeng Mao

Pascal Fortier-Poisson

When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream… (see more)) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challenge when modelling such data is that it is easy for a neural network to rely on the greatest joint correlations within the target stream, which may ignore a crucial but small information transfer from the source to the target stream. As well, there are often situations where the target stream may have previously been modelled independently and it would be useful to use that model to inform a new joint model. Here, we develop an information bottleneck approach for conditional learning on two dependent streams of data. Our method, which we call Transfer Entropy Bottleneck (TEB), allows one to learn a model that bottlenecks the directed information transferred from the source variable to the target variable, while quantifying this information transfer within the model. As such, TEB provides a useful new information bottleneck approach for modelling two statistically dependent streams of data in order to make predictions about one of them.

2023-03-08

TMLR (accepted)

How gradient estimator variance and bias impact learning in neural networks

Arna Ghosh

Yuhan Helena Liu

Konrad Paul Kording

There is growing interest in understanding how real brains may approximate gradients and how gradients can be used to train neuromorphic chi… (see more)ps. However, neither real brains nor neuromorphic chips can perfectly follow the loss gradient, so parameter updates would necessarily use gradient estimators that have some variance and/or bias. Therefore, there is a need to understand better how variance and bias in gradient estimators impact learning dependent on network and task properties. Here, we show that variance and bias can impair learning on the training data, but some degree of variance and bias in a gradient estimator can be beneficial for generalization. We find that the ideal amount of variance and bias in a gradient estimator are dependent on several properties of the network and task: the size and activity sparsity of the network, the norm of the gradient, and the curvature of the loss landscape. As such, whether considering biologically-plausible learning algorithms or algorithms for training neuromorphic chips, researchers can analyze these properties to determine whether their approximation to gradient descent will be effective for learning given their network and task properties.

2023-02-01

ICLR.cc/2023/Conference (poster)