Guillaume Lajoie

Biography

Guillaume Lajoie is an Associate professor in the Department of Mathematics and Statistics at Université de Montréal and a Core Academic Member of Mila – Quebec Artificial Intelligence Institute. He holds a Canada-CIFAR AI Research Chair, and a Canada Research Chair (CRC) in Neural Computation and Interfacing.

His research is positioned at the intersection of AI and Neuroscience where he develops tools to better understand mechanisms of intelligence common to both biological and artificial systems. His research group's contributions range from advances in multi-scale learning paradigms for large artificial systems, to applications in neurotechnology. Dr. Lajoie is actively involved in responsible AI development efforts, seeking to identify guidelines and best practices for use of AI in research and beyond.

Current Students

Federico Arangath Joseph

Collaborating researcher - ETH Zurich

Stefan Bauer

Independent visiting researcher

Principal supervisor :

Sangnie Bhardwaj

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

Colin Bredenberg

Postdoctorate - Université de Montréal

Co-supervisor :

Blake Richards

Leo Choiniere

PhD - Université de Montréal

Olivier Codol

Postdoctorate - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

Leo Gagnon

PhD - Université de Montréal

Tom George

Postdoctorate - McGill University

Principal supervisor :

Skylar Gu

Research Intern - McGill University

Principal supervisor :

Dhanya Sridhar

Juan Guerra

Master's Research - Polytechnique Montréal

Principal supervisor :

Nanda Harishankar Krishna

PhD - Université de Montréal

Anna Jahn

Independent visiting researcher - McGill University

Chen Jiang

PhD - McGill University

Principal supervisor :

Paul Masset

Thomas Jiralerspong

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

Co-supervisor :

Research Intern - Concordia University

Co-supervisor :

Matt Perich

Ximeng Mao

PhD - Université de Montréal

Co-supervisor :

Joelle Pineau

Abdel Mfougouon Njupoun

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher - Université de Montréal

Mohammad Pezeshki

Collaborating researcher

Principal supervisor :

Irina Rish

Julia Price

Master's Research - Université de Montréal

Mauricio Rivera

Master's Research - Université de Montréal

Principal supervisor :

Marco Bonizzato

Avery Ryoo

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

Lune Bellec

Ryan Vogt

Postdoctorate - Université de Montréal

Ezekiel Williams

PhD - Université de Montréal

Machine Learning for the Segmentation of Different Nerve Fibre Activations from Brain-to-body Neural Signals

Blog Posts

Représentation graphique d'un nerf vague

May 21, 2025

Param Raval

Olivier Tessier-Larivière

Pascal Fortier-Poisson

Blake Richards

Guillaume Lajoie

Read the article

June 13, 2024

What Do Synaptic Weight Distributions Tell Us About Learning in the Brain ?

Roman Pogodin

Jonathan Cornford

Arna Ghosh

Gauthier Gidel

Guillaume Lajoie

Blake Richards

Read the article

Publications

A connectomics-based taxonomy of mammals

Laura E. Suárez

Yossi Yovel

Martijn P. van den Heuvel

Olaf Sporns

Yaniv Assaf

Bratislav Mišić

2022-03-12

bioRxiv (preprint)

A connectomics-based taxonomy of mammals

Laura E. Suárez

Yossi Yovel

M. P. van den Heuvel

Olaf Sporns

Yaniv Assaf

Bratislav Mišić

Mammalian taxonomies are conventionally defined by morphological traits and genetics. How species differ in terms of neural circuits and whe… (see more)ther inter-species differences in neural circuit organization conform to these taxonomies is unknown. The main obstacle for the comparison of neural architectures have been differences in network reconstruction techniques, yielding species-specific connectomes that are not directly comparable to one another. Here we comprehensively chart connectome organization across the mammalian phylogenetic spectrum using a common reconstruction protocol. We analyze the mammalian MRI (MaMI) data set, a database that encompasses high-resolution ex vivo structural and diffusion magnetic resonance imaging (MRI) scans of 124 species across 12 taxonomic orders and 5 superorders, collected using a single protocol on a single scanner. We assess similarity between species connectomes using two methods: similarity of Laplacian eigenspectra and similarity of multiscale topological features. We find greater inter-species similarities among species within the same taxonomic order, suggesting the connectome organization recapitulates traditional taxonomies defined by morphology and genetics. While all connectomes retain hallmark global features and relative proportions of connection classes, inter-species variation is driven by local regional connectivity profiles. By encoding connectomes into a common frame of reference, these findings establish a foundation for investigating how neural circuits change over phylogeny, forging a link from genes to circuits to behaviour.

2022-03-12

bioRxiv (preprint)

Continuous-Time Meta-Learning with Forward Mode Differentiation

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learni… (see more)ng (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differential equation (ODE). Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous, as opposed to a fixed and discrete number of gradient steps. As a consequence, we can optimize the amount of adaptation necessary to solve a new task using stochastic gradient descent, in addition to learning the initial conditions as is standard practice in gradient-based meta-learning. Importantly, in order to compute the exact meta-gradients required for the outer-loop updates, we devise an efficient algorithm based on forward mode differentiation, whose memory requirements do not scale with the length of the learning trajectory, thus allowing longer adaptation in constant memory. We provide analytical guarantees for the stability of COMLN, we show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.

2022-01-28

ICLR.cc/2022/Conference (spotlight)

Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules

Yuhan Helena Liu

Eric Todd SheaBrown

Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal

Sharath Chandra Raparthy

Irina Rish

Multi-head, key-value attention is the backbone of transformer-like model architectures which have proven to be widely successful in recent … (see more)years. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interaction, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval and is easy to implement in a variety of established network architectures.

2022-01-01

International Conference on Learning Representations (published)

Goal-driven optimization of single-neuron properties in artiﬁcial networks reveals regularization role of neural diversity and adaptation in the brain

Neurons in the brain have rich and adaptive input-output properties. Features such as diverse f-I curves and spike frequency adaptation are … (see more)known to place single neurons in optimal coding regimes when facing changing stimuli. Yet, it is still unclear how brain circuits exploit single neuron ﬂexibility, and how network-level requirements may have shaped such cellular function. To answer this question, a multi-scaled approach is needed where the computations of single neurons and of neural circuits must be considered as a complete system. In this work, we use artiﬁcial neural networks to systematically investigate single neuron input-output adaptive mechanisms, optimized in an end-to-end fashion. Throughout the optimization process, each neuron has the liberty to modify its nonlinear activation function, parametrized to mimic f-I curves of biological neurons, and to learn adaptation strategies to modify activation functions in real-time during a task. We ﬁnd that such networks show much-improved robustness to noise and changes in input statistics. Importantly, we ﬁnd that this procedure recovers precise coding strategies found in biological neurons, such as gain scaling and fractional order differentiation/integration. Using tools from dynamical systems theory, we analyze the role of these emergent single neuron properties and argue that neural diversity and adaptation plays an active regularization role that enables neural circuits to optimally propagate information across time.

2022-01-01

(published)

www.semanticscholar.org

Goal-driven optimization of single-neuron properties in artiﬁcial networks reveals regularization role of neural diversity and adaptation in the brain

Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

Thomas George

Aristide Baratin

Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called `laz… (see more)y' training regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. In other words, the non-linear dynamics tends to sequentialize the learning of examples of increasing difficulty. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of easy-to-learn spurious correlations. Our results reveal a new understanding of how deep networks prioritize resources across example difficulty.

2022-01-01

Trans. Mach. Learn. Res. (published)

Is a Modular Architecture Enough?

Sarthak Mittal

Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent… (see more) work demonstrates that not only do some modular architectures generalize well, but they also lead to better out of distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparse modular connections, and endowing models with similar inductive biases will be helpful. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.

Gradient Starvation: A Learning Proclivity in Neural Networks

Sékou-Oumar Kaba

We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks… (see more). Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. We illustrate our findings with simple and real-world out-of-distribution (OOD) generalization experiments.

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

Nan Rosemary Ke

Aniket Rajiv Didolkar

Danilo Jimenez Rezende

Michael Curtis Mozer

Chris Pal

Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise tha… (see more)t the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.

2021-10-11

NeurIPS.cc/2021/Track/Datasets_and_Benchmarks/Round2 (published)

Learning function from structure in neuromorphic networks

Laura E. Suárez

Blake Richards

Bratislav Mišić

2021-08-09

Nature Machine Intelligence (published)