Guillaume Lajoie

Biography

Guillaume Lajoie is an Associate professor in the Department of Mathematics and Statistics at Université de Montréal and a Core Academic Member of Mila – Quebec Artificial Intelligence Institute. He holds a Canada-CIFAR AI Research Chair, and a Canada Research Chair (CRC) in Neural Computation and Interfacing.

His research is positioned at the intersection of AI and Neuroscience where he develops tools to better understand mechanisms of intelligence common to both biological and artificial systems. His research group's contributions range from advances in multi-scale learning paradigms for large artificial systems, to applications in neurotechnology. Dr. Lajoie is actively involved in responsible AI development efforts, seeking to identify guidelines and best practices for use of AI in research and beyond.

Current Students

Federico Arangath Joseph

Collaborating researcher - ETH Zurich

Rohan Banerjee

Collaborating Alumni - Polytechnique Montréal

Independent visiting researcher

Principal supervisor :

Yoshua Bengio

Sangnie Bhardwaj

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

Colin Bredenberg

Postdoctorate - Université de Montréal

Co-supervisor :

Blake Richards

Leo Choiniere

PhD - Université de Montréal

Olivier Codol

Postdoctorate - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Leo Gagnon

PhD - Université de Montréal

Tom George

Postdoctorate - McGill University

Principal supervisor :

Juan Guerra

Master's Research - Polytechnique Montréal

Principal supervisor :

Nanda Harishankar Krishna

PhD - Université de Montréal

Anna Jahn

Independent visiting researcher - McGill University

Chen Jiang

PhD - McGill University

Principal supervisor :

Paul Masset

Thomas Jiralerspong

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

Co-supervisor :

PhD - McGill University

Principal supervisor :

Blake Richards

Mathys Loiselle

Research Intern - Concordia University

Co-supervisor :

Ximeng Mao

PhD - Université de Montréal

Co-supervisor :

Abdel Mfougouon Njupoun

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher - Université de Montréal

Mohammad Pezeshki

Collaborating researcher

Principal supervisor :

Irina Rish

Julia Price

Master's Research - Université de Montréal

Mauricio Rivera

Master's Research - Université de Montréal

Principal supervisor :

Marco Bonizzato

Avery Ryoo

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

Lune Bellec

Ryan Vogt

Postdoctorate - Université de Montréal

Ezekiel Williams

PhD - Université de Montréal

Jieyu Zhao

Independent visiting researcher - University of South California

Machine Learning for the Segmentation of Different Nerve Fibre Activations from Brain-to-body Neural Signals

Blog Posts

Représentation graphique d'un nerf vague

May 21, 2025

Param Raval

Olivier Tessier-Larivière

Pascal Fortier-Poisson

Blake Richards

Guillaume Lajoie

Read the article

June 13, 2024

What Do Synaptic Weight Distributions Tell Us About Learning in the Brain ?

Roman Pogodin

Jonathan Cornford

Arna Ghosh

Gauthier Gidel

Guillaume Lajoie

Blake Richards

Read the article

Publications

JEDI: Jointly Embedded Inference of Neural Dynamics

Matthew G. Perich

Animal brains flexibly and efficiently achieve many behavioral tasks with a single neural network. A core goal in modern neuroscience is to … (see more)map the mechanisms of the brain's flexibility onto the dynamics underlying neural populations. However, identifying task-specific dynamical rules from limited, noisy, and high-dimensional experimental neural recordings remains a major challenge, as experimental data often provide only partial access to brain states and dynamical mechanisms. While recurrent neural networks (RNNs) directly constrained neural data have been effective in inferring underlying dynamical mechanisms, they are typically limited to single-task domains and struggle to generalize across behavioral conditions. Here, we introduce JEDI, a hierarchical model that captures neural dynamics across tasks and contexts by learning a shared embedding space over RNN weights. This model recapitulates individual samples of neural dynamics while scaling to arbitrarily large and complex datasets, uncovering shared structure across conditions in a single, unified model. Using simulated RNN datasets, we demonstrate that JEDI accurately learns robust, generalizable, condition-specific embeddings. By reverse-engineering the weights learned by JEDI, we show that it recovers ground truth fixed point structures and unveils key features of the underlying neural dynamics in the eigenspectra. Finally, we apply JEDI to motor cortex recordings during monkey reaching to extract mechanistic insight into the neural dynamics of motor control. Our work shows that joint learning of contextual embeddings and recurrent weights provides scalable and generalizable inference of brain dynamics from recordings alone.

2026-03-10

arXiv (preprint)

Evolutionarily conserved neural dynamics across mice, monkeys, and humans

Olivier Codol

Margaux Asclipe

Anton R Sobinov

Z. Jeffrey Chen

Junchol Park

Nicholas G. Hatsopoulos

Joshua T. Dudman

Juan Álvaro Gallego

Matthew G. Perich

Zihao Chen

On evolutionary timescales, brain circuits adapt to support survival in each species' ecological niche. While some anatomical aspects of neu… (see more)ral circuitry are conserved across species with distant evolutionary origins, each species also exhibits specific circuit adaptations that enable its behavioral repertoire. It remains unclear whether homologous brain regions leverage analogous neural computations as different species perform common behaviors such as reaching and manipulating objects. Here, we directly assessed conservation of neural computations using intracortical recordings from mouse, monkey, and human motor cortex-a homologous region across many mammals-during motor behaviors crucial for survival. We hypothesized that, despite their phylogenetic distance, rodents and primates produce movements through conserved neural computations implemented by motor cortical population dynamics. Remarkably, we found that movement-related neural dynamics were highly conserved across species, while variations in behavioral output were uniquely captured in neural trajectory geometries. Strikingly, neural dynamics during movement across species were more conserved than those across brain regions in the same human and between motor preparation and execution in the same monkeys. Lastly, through manipulation of neural network models trained to perform reaching movements, we reinforce that conservation of neural dynamics across species likely stems from shared circuit constraints. We thus assert that evolution maintains neural computations across phylogeny even as behavioral repertoires expand.

2026-03-06

bioRxiv (preprint)

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Seijin Kobayashi

Yanick Schimpf

Maximilian Schlegel

Angelika Steger

Maciej Wolczyk

Johannes Von Oswald

Nino Scherrer

Kaitlin Maile

Blake Aaron Richards

Rif A. Saurous

James Manyika

Blaise Agüera y Arcas

Alexander Meulemans

João Sacramento

Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecede… (see more)nted success on many problem domains. During RL, these models explore by generating new outputs, one token at a time. However, sampling actions token-by-token can result in highly inefficient learning, particularly when rewards are sparse. Here, we show that it is possible to overcome this problem by acting and exploring within the internal representations of an autoregressive model. Specifically, to discover temporally-abstract actions, we introduce a higher-order, non-causal sequence model whose outputs control the residual stream activations of a base autoregressive model. On grid world and MuJoCo-based tasks with hierarchical structure, we find that the higher-order model learns to compress long activation sequence chunks onto internal controllers. Critically, each controller executes a sequence of behaviorally meaningful actions that unfold over long timescales and are accompanied with a learned termination condition, such that composing multiple controllers over time leads to efficient exploration on novel tasks. We show that direct internal controller reinforcement, a process we term "internal RL", enables learning from sparse rewards in cases where standard RL finetuning fails. Our results demonstrate the benefits of latent action generation and reinforcement in autoregressive models, suggesting internal RL as a promising avenue for realizing hierarchical RL within foundation models.

2025-12-22

ArXiv (preprint)

Training neural networks from scratch in a videogame leads to brittle brain encoding

Basile Pinsard

Recent brain-encoding studies using videogame tasks suggest that the training objective of an artificial neural network plays a central role… (see more) in how well the network’s representations align with brain activity. This study investigates the alignment of artificial neural network activations with brain activity elicited by a video game task using models trained from scratch in controlled settings. We specifically compared three model training objectives: reinforcement learning, imitation learning, and a vision task, while accounting for other potential factors which may impact performance such as training data and model architecture. We tested models on brain encoding, i.e. their ability to predict functional magnetic resonance imaging (fMRI) signals acquired while human subjects played different levels of the video game Super Mario Bros. When tested on new playthroughs from the game levels seen at training, the reinforcement learning objective had a small but significant advantage in brain encoding, followed by the imitation learning and vision models. We hypothesized that brain-aligned representations would emerge only in task-competent models, and that the specific brain regions well encoded by a model would depend on the nature of the task it was trained on. While brain encoding did improve during model training, even an untrained model with matching architecture approached the performance of the best models. Contrary to our hypotheses, no model layers or specific training objectives aligned preferentially with specific brain areas. Large performance gaps also persisted in fully trained models across game levels, both those seen during training and entirely novel ones. Overall, even though reinforcement learning presented a small advantage to train brain encoding models for videogame data, all tested brain encoding models exhibited brittle performance with limited generalization both within- and out-of-distribution. Overall, our results suggest that training small artificial models from scratch is not sufficiently reliable, and that incorporating pretrained models such as foundation vision–action models may ultimately be necessary to support robust inferences about brain representations.

2025-12-01

bioRxiv (preprint)

Embedded Universal Predictive Intelligence: a coherent framework for multi-agent learning

Alexander Meulemans

Rajai Nasser

Maciej Wolczyk

Marissa A. Weis

Seijin Kobayashi

Blake Aaron Richards

Angelika Steger

Marcus Hutter

James Manyika

Rif A. Saurous

João Sacramento

Blaise Agüera y Arcas

2025-11-26

ArXiv (preprint)

Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers

2025-10-12

ArXiv (preprint)

Towards a Formal Theory of Representational Compositionality

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Johan Samir Obando Ceron

Yoshua Bengio

Brian R. Bartoldson

Bhavya Kailkhura

Glen Berseth

Nikolay Malkin

Moksh J. Jain

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference… (see more) to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.

2025-09-29

ArXiv (preprint)

Inferring dynamical features from neural data through joint learning of latents factors and weights

Anirudh Gururaj Jamkhandi

Ali Korojy

Olivier Codol

Matthew G Perich

Behavior arises from coordinated synaptic changes in recurrent neural populations. Inferring the underlying dynamics from limited, noisy, an… (see more)d high-dimensional neural recordings is a major challenge, as experimental data often provide only partial access to brain states. While data-driven recurrent neural networks (dRNNs) have been effective for modeling such dynamics, they are typically limited to single-task domains and struggle to generalize across behavioral conditions. Here, we propose a hierachical model that captures neural dynamics across multiple behavioral contexts by learning a shared embedding space over RNN weights. We demonstrate that our model captures diverse neural dynamics with a single, unified model using both simulated datasets of many tasks and neural recordings during monkey reaching. Using the learned task embeddings, we demonstrate accurate classification of dynamical regimes and generalization to unseen samples. Crucially, spectral analysis on the learnt weights provide valuable insights into network computations, highlighting the potential of joint embedding–weight learning for scalable inference of brain dynamics.

2025-09-22

NeurIPS.cc/2025/Workshop/NeurReps (poster)

Towards a generalizable, unified framework for decoding from multimodal neural activity

Nanda H Krishna

Mathys Loiselle

Avery Hee-Woon Ryoo

Matthew G Perich

Recent advances in neural decoding have led to the development of large-scale deep learning-based neural decoders that can generalize across… (see more) sessions and subjects. However, existing approaches predominantly focus on single modalities of neural activity, limiting their applicability to specific modalities and tasks. In this work, we present a multimodal extension of the POYO framework that jointly processes neuronal spikes and local field potentials (LFPs) for behavioural decoding. Our approach employs flexible tokenization schemes for both spikes and LFPs, enabling efficient processing of heterogeneous neural populations without preprocessing requirements like binning. Through experiments on data from nonhuman primates performing motor tasks, we demonstrate that multimodal pretraining yields superior decoding performance compared to unimodal baselines. We also show evidence of cross-modal transfer: models pretrained on both modalities outperform LFP-only models when fine-tuned solely on LFPs, suggesting a path toward more cost-effective brain-computer interfaces that can use performant LFP-based decoders. Our models also exhibit robustness to missing modalities during inference when trained with modality masking, and scale effectively with both model size and pretraining data. Overall, this work represents an important first step towards unified, general-purpose neural decoders capable of leveraging diverse neural signals for a variety of brain-computer interface applications.

2025-09-21

NeurIPS.cc/2025/Workshop/BrainBodyFM (published)

Generalizable, real-time neural decoding with hybrid state-space models

Avery Hee-Woon Ryoo

Nanda H Krishna

Ximeng Mao

Mehdi Azabou

Eva L Dyer

Matthew G Perich

Real-time decoding of neural activity is central to neuroscience and neurotechnology applications, from closed-loop experiments to brain-com… (see more)puter interfaces, where models are subject to strict latency constraints. Traditional methods, including simple recurrent neural networks, are fast and lightweight but often struggle to generalize to unseen data. In contrast, recent Transformer-based approaches leverage large-scale pretraining for strong generalization performance, but typically have much larger computational requirements and are not always suitable for low-resource or real-time settings. To address these shortcomings, we present POSSM, a novel hybrid architecture that combines individual spike tokenization via a cross-attention module with a recurrent state-space model (SSM) backbone to enable (1) fast and causal online prediction on neural activity and (2) efficient generalization to new sessions, individuals, and tasks through multi-dataset pretraining. We evaluate POSSM's decoding performance and inference speed on intracortical decoding of monkey motor tasks, and show that it extends to clinical applications, namely handwriting and speech decoding in human subjects. Notably, we demonstrate that pretraining on monkey motor-cortical recordings improves decoding performance on the human handwriting task, highlighting the exciting potential for cross-species transfer. In all of these tasks, we find that POSSM achieves decoding accuracy comparable to state-of-the-art Transformers, at a fraction of the inference cost (up to 9x faster on GPU). These results suggest that hybrid SSMs are a promising approach to bridging the gap between accuracy, inference speed, and generalization when training neural decoders for real-time, closed-loop applications.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Melody Zixuan Li

Kumar Krishna Agrawal

Arna Ghosh

Komal Kumar Teru

Adam Santoro

Blake A. Richards

Standard training metrics like loss fail to explain the emergence of complex capabilities in large language models. We take a spectral appro… (see more)ach to investigate the geometry of learned representations across pretraining and post-training, measuring effective rank (RankMe) and eigenspectrum decay (

2025-09-17

NeurIPS.cc/2025/Conference (poster)