Guillaume Lajoie

Biography

Guillaume Lajoie is an Associate professor in the Department of Mathematics and Statistics at Université de Montréal and a Core Academic Member of Mila – Quebec Artificial Intelligence Institute. He holds a Canada-CIFAR AI Research Chair, and a Canada Research Chair (CRC) in Neural Computation and Interfacing.

His research is positioned at the intersection of AI and Neuroscience where he develops tools to better understand mechanisms of intelligence common to both biological and artificial systems. His research group's contributions range from advances in multi-scale learning paradigms for large artificial systems, to applications in neurotechnology. Dr. Lajoie is actively involved in responsible AI development efforts, seeking to identify guidelines and best practices for use of AI in research and beyond.

Current Students

Federico Arangath Joseph

Collaborating researcher - ETH Zurich

Stefan Bauer

Independent visiting researcher

Principal supervisor :

Yoshua Bengio

Sangnie Bhardwaj

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

Colin Bredenberg

Postdoctorate - Université de Montréal

Co-supervisor :

Blake Richards

Leo Choiniere

PhD - Université de Montréal

Olivier Codol

Postdoctorate - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Leo Gagnon

PhD - Université de Montréal

Tom George

Postdoctorate - McGill University

Principal supervisor :

Skylar Gu

Research Intern - McGill University

Principal supervisor :

Dhanya Sridhar

Juan Guerra

Master's Research - Polytechnique Montréal

Principal supervisor :

Nanda Harishankar Krishna

PhD - Université de Montréal

Chen Jiang

PhD - McGill University

Principal supervisor :

Paul Masset

Thomas Jiralerspong

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

Co-supervisor :

Research Intern - Concordia University

Co-supervisor :

Matt Perich

Ximeng Mao

PhD - Université de Montréal

Co-supervisor :

Joelle Pineau

Abdel Mfougouon Njupoun

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher - Université de Montréal

Mohammad Pezeshki

Collaborating researcher

Principal supervisor :

Irina Rish

Julia Price

Master's Research - Université de Montréal

Avery Ryoo

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

Lune Bellec

Ryan Vogt

Postdoctorate - Université de Montréal

Ezekiel Williams

PhD - Université de Montréal

Machine Learning for the Segmentation of Different Nerve Fibre Activations from Brain-to-body Neural Signals

Blog Posts

Représentation graphique d'un nerf vague

May 21, 2025

Param Raval

Olivier Tessier-Larivière

Pascal Fortier-Poisson

Blake Richards

Guillaume Lajoie

Read the article

June 13, 2024

What Do Synaptic Weight Distributions Tell Us About Learning in the Brain ?

Roman Pogodin

Jonathan Cornford

Arna Ghosh

Gauthier Gidel

Guillaume Lajoie

Blake Richards

Read the article

Publications

Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers

2025-10-13

ArXiv (preprint)

arxiv.org

Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers

Modern learning systems increasingly rely on amortized learning - the idea of reusing computation or inductive biases shared across tasks to… (see more) enable rapid generalization to novel problems. This principle spans a range of approaches, including meta-learning, in-context learning, prompt tuning, learned optimizers and more. While motivated by similar goals, these approaches differ in how they encode and leverage task-specific information, often provided as in-context examples. In this work, we propose a unified framework which describes how such methods differ primarily in the aspects of learning they amortize - such as initializations, learned updates, or predictive mappings - and how they incorporate task data at inference. We introduce a taxonomy that categorizes amortized models into parametric, implicit, and explicit regimes, based on whether task adaptation is externalized, internalized, or jointly modeled. Building on this view, we identify a key limitation in current approaches: most methods struggle to scale to large datasets because their capacity to process task data at inference (e.g., context length) is often limited. To address this, we propose iterative amortized inference, a class of models that refine solutions step-by-step over mini-batches, drawing inspiration from stochastic optimization. Our formulation bridges optimization-based meta-learning with forward-pass amortization in models like LLMs, offering a scalable and extensible foundation for general-purpose task adaptation.

2025-10-13

ArXiv (preprint)

arxiv.org

Does learning the right latent variables necessarily improve in-context learning?

Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting ave… (see more)nues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

openreview.net

Does learning the right latent variables necessarily improve in-context learning?

Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting ave… (see more)nues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or instead exploit heuristics and statistical shortcuts through attention layers. In this paper, we systematically investigate the effect of explicitly inferring task latents by minimally modifying the Transformer architecture with a bottleneck to prevent shortcuts and incentivize structured solutions. We compare it against standard Transformers across various ICL tasks and find that contrary to intuition and recent works, there is little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

In-Context Learning and Occam’s Razor

A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees fo… (see more)r generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best—a principle called Occam’s razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam’s razor and in-context learning—an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

In-context learning and Occam's razor

A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees fo… (see more)r generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

openreview.net

Towards a Formal Theory of Representational Compositionality

Compositionality is believed to be fundamental to intelligence. In humans, it underlies the structure of thought and language. In AI, it ena… (see more)bles a powerful form of out-of-distribution generalization, in which a model systematically adapts to novel combinations of known concepts. However, while we have strong intuitions about what compositionality is, we lack satisfying formal definitions for it. Here, we propose such a definition called representational compositionality that is conceptually simple, quantitative, and grounded in algorithmic information theory. Intuitively, representational compositionality states that a compositional representation is both expressive and describable as a simple function of parts. We validate our definition on both real and synthetic data, and show how it unifies disparate intuitions from across the literature in both AI and cognitive science. We hope that our definition can inspire the design of novel, theoretically-driven models that better capture the mechanisms of compositional thought. We make our code available at https://github.com/EricElmoznino/complexity_compositionality.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

openreview.net

Towards a Formal Theory of Representational Compositionality

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)