Guillaume Lajoie

Biography

Guillaume Lajoie is an Associate professor in the Department of Mathematics and Statistics at Université de Montréal and a Core Academic Member of Mila – Quebec Artificial Intelligence Institute. He holds a Canada-CIFAR AI Research Chair, and a Canada Research Chair (CRC) in Neural Computation and Interfacing.

His research is positioned at the intersection of AI and Neuroscience where he develops tools to better understand mechanisms of intelligence common to both biological and artificial systems. His research group's contributions range from advances in multi-scale learning paradigms for large artificial systems, to applications in neurotechnology. Dr. Lajoie is actively involved in responsible AI development efforts, seeking to identify guidelines and best practices for use of AI in research and beyond.

Current Students

Federico Arangath Joseph

Collaborating researcher - ETH Zurich

Stefan Bauer

Independent visiting researcher

Principal supervisor :

Yoshua Bengio

Sangnie Bhardwaj

PhD - Université de Montréal

Co-supervisor :

Hugo Larochelle

Colin Bredenberg

Postdoctorate - Université de Montréal

Co-supervisor :

Blake Richards

Leo Choiniere

PhD - Université de Montréal

Olivier Codol

Postdoctorate - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Leo Gagnon

PhD - Université de Montréal

Skylar Gu

Research Intern - McGill University

Principal supervisor :

Juan Guerra

Master's Research - Polytechnique Montréal

Principal supervisor :

Marco Bonizzato

Nanda Harishankar Krishna

PhD - Université de Montréal

Collaborating researcher - Western Washington University (faculty; assistant prof))

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

Co-supervisor :

tejaskasetty@gmail.com

Ximeng Mao

PhD - Université de Montréal

Co-supervisor :

Joelle Pineau

Abdel Mfougouon Njupoun

PhD - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Co-supervisor :

Amine Natik

PhD - Université de Montréal

Co-supervisor :

Guy Wolf

Alexandre Payeur

Collaborating researcher - Université de Montréal

Mohammad Pezeshki

Collaborating researcher

Principal supervisor :

Collaborating Alumni - McGill University

Principal supervisor :

Julia Price

Master's Research - Université de Montréal

Param Raval

Collaborating Alumni - Université de Montréal

Avery Ryoo

Master's Research - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

Lune Bellec

Ayesha Vermani

Independent visiting researcher - Champalimeau Institute for the Unknown

Ryan Vogt

Postdoctorate - Université de Montréal

Vivian White

Research Intern - Western Washington University

Co-supervisor :

PhD - Université de Montréal

Machine Learning for the Segmentation of Different Nerve Fibre Activations from Brain-to-body Neural Signals

Blog Posts

Représentation graphique d'un nerf vague

May 21, 2025

Param Raval

Olivier Tessier-Larivière

Pascal Fortier-Poisson

Blake Richards

Guillaume Lajoie

Read the article

June 13, 2024

What Do Synaptic Weight Distributions Tell Us About Learning in the Brain ?

Roman Pogodin

Jonathan Cornford

Arna Ghosh

Gauthier Gidel

Guillaume Lajoie

Blake Richards

Read the article

Publications

When can transformers compositionally generalize in-context?

Seijin Kobayashi

Simon Schug

Yassir Akram

Florian Redhardt

Johannes Von Oswald

Razvan Pascanu

João Sacramento

Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of w… (see more)hich might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution.

2024-07-17

ArXiv (preprint)

arxiv.org

A benchmark of individual auto-regressive models in a massive fMRI dataset

Fraçois Paugam

Basile Pinsard

Lune Bellec

Abstract Dense functional magnetic resonance imaging datasets open new avenues to create auto-regressive models of brain activity. Individua… (see more)l idiosyncrasies are obscured by group models, but can be captured by purely individual models given sufficient amounts of training data. In this study, we compared several deep and shallow individual models on the temporal auto-regression of BOLD time-series recorded during a natural video-watching task. The best performing models were then analyzed in terms of their data requirements and scaling, subject specificity, and the space-time structure of their predicted dynamics. We found the Chebnets, a type of graph convolutional neural network, to be best suited for temporal BOLD auto-regression, closely followed by linear models. Chebnets demonstrated an increase in performance with increasing amounts of data, with no complete saturation at 9 h of training data. Good generalization to other kinds of video stimuli and to resting-state data marked the Chebnets’ ability to capture intrinsic brain dynamics rather than only stimulus-specific autocorrelation patterns. Significant subject specificity was found at short prediction time lags. The Chebnets were found to capture lower frequencies at longer prediction time lags, and the spatial correlations in predicted dynamics were found to match traditional functional connectivity networks. Overall, these results demonstrate that large individual functional magnetic resonance imaging (fMRI) datasets can be used to efficiently train purely individual auto-regressive models of brain activity, and that massive amounts of individual data are required to do so. The excellent performance of the Chebnets likely reflects their ability to combine spatial and temporal interactions on large time scales at a low complexity cost. The non-linearities of the models did not appear as a key advantage. In fact, surprisingly, linear versions of the Chebnets appeared to outperform the original non-linear ones. Individual temporal auto-regressive models have the potential to improve the predictability of the BOLD signal. This study is based on a massive, publicly-available dataset, which can serve for future benchmarks of individual auto-regressive modeling.

2024-07-01

Imaging Neuroscience (published)

Using neural biomarkers to personalize dosing of vagus nerve stimulation

Antonin Berthon

Lorenz Wernisch

Myrta Stoukidi

Michael Thornton

Olivier Tessier-Lariviere

Pascal Fortier-Poisson

Jorin Mamen

Max Pinkney

Susannah Lee

Elvijs Sarkans

Luca Annecchino

Ben Appleton

Philip Garsed

Bret Patterson

Samuel Gonshaw

Matjaž Jakopec

Sudhakaran Shunmugam

Tristan Edwards

Aleksi Tukiainen

Joel Jennings … (see 3 more)

Emil Hewage

Oliver Armitage

2024-06-17

Bioelectronic Medicine (published)

Expressivity of Neural Networks with Fixed Weights and Learned Biases

Ezekiel Williams

Avery Hee-Woon Ryoo

Thomas Jiralerspong

Alexandre Payeur

Matt Perich

Luca Mazzucato

2024-06-16

ICML.cc/2024/Workshop/HiLD (poster)

Does learning the right latent variables necessarily improve in-context learning?

Sarthak Mittal

Eric Elmoznino

Leo Gagnon

Sangnie Bhardwaj

Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting ave… (see more)nues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.

2024-05-29

ArXiv (preprint)

Assistive sensory-motor perturbations influence learned neural representations

Pavithra Rajeswaran

Alexandre Payeur

Amy L. Orsborn

Task errors are used to learn and refine motor skills. We investigated how task assistance influences learned neural representations using B… (see more)rain-Computer Interfaces (BCIs), which map neural activity into movement via a decoder. We analyzed motor cortex activity as monkeys practiced BCI with a decoder that adapted to improve or maintain performance over days. Population dimensionality remained constant or increased with learning, counter to trends with non-adaptive BCIs. Yet, over time, task information was contained in a smaller subset of neurons or population modes. Moreover, task information was ultimately stored in neural modes that occupied a small fraction of the population variance. An artificial neural network model suggests the adaptive decoders contribute to forming these compact neural representations. Our findings show that assistive decoders manipulate error information used for long-term learning computations, like credit assignment, which informs our understanding of motor learning and has implications for designing real-world BCIs.

2024-03-20

bioRxiv (preprint)

Online Bayesian optimization of vagus nerve stimulation.

Lorenz Wernisch

Tristan Edwards

Antonin Berthon

Olivier Tessier-Lariviere

Elvijs Sarkans

Myrta Stoukidi

Pascal Fortier-Poisson

Max Pinkney

Michael Thornton

Catherine Hanley

Susannah Lee

Joel Jennings

Ben Appleton

Philip Garsed

Bret Patterson

Buttinger Will

Samuel Gonshaw

Matjaž Jakopec

Sudhakaran Shunmugam

Jorin Mamen … (see 4 more)

Aleksi Tukiainen

Oliver Armitage

Emil Hewage

OBJECTIVE In bioelectronic medicine, neuromodulation therapies induce neural signals to the brain or organs, modifying their function. Stimu… (see more)lation devices capable of triggering exogenous neural signals using electrical waveforms require a complex and multi-dimensional parameter space to control such waveforms. Determining the best combination of parameters (waveform optimization or dosing) for treating a particular patient's illness is therefore challenging. Comprehensive parameter searching for an optimal stimulation effect is often infeasible in a clinical setting due to the size of the parameter space. Restricting this space, however, may lead to suboptimal therapeutic results, reduced responder rates, and adverse effects. Approach. As an alternative to a full parameter search, we present a flexible machine learning, data acquisition, and processing framework for optimizing neural stimulation parameters, requiring as few steps as possible using Bayesian optimization. This optimization builds a model of the neural and physiological responses to stimulations, enabling it to optimize stimulation parameters and provide estimates of the accuracy of the response model. The vagus nerve innervates, among other thoracic and visceral organs, the heart, thus controlling heart rate, making it an ideal candidate for demonstrating the effectiveness of our approach. Main results. The efficacy of our optimization approach was first evaluated on simulated neural responses, then applied to vagus nerve stimulation intraoperatively in porcine subjects. Optimization converged quickly on parameters achieving target heart rates and optimizing neural B-fiber activations despite high intersubject variability. Significance. An optimized stimulation waveform was achieved in real time with far fewer stimulations than required by alternative optimization strategies, thus minimizing exposure to side effects. Uncertainty estimates helped avoiding stimulations outside a safe range. Our approach shows that a complex set of neural stimulation parameters can be optimized in real-time for a patient to achieve a personalized precision dosing. .

2024-03-13

Journal of Neural Engineering (published)

Explicit Knowledge Factorization Meets In-Context Learning: What Do We Gain?

Sarthak Mittal

Eric Elmoznino

Leo Gagnon

Sangnie Bhardwaj

2024-03-05

ICLR.cc/2024/Workshop/R2-FM (poster)

Learning and Aligning Structured Random Feature Networks

Vivian White

Muawiz Sajjad Chaudhary

Guy Wolf

Kameron Decker Harris

Artificial neural networks (ANNs) are considered "black boxes'' due to the difficulty of interpreting their learned weights. While choosing… (see more) the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to develop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.

2024-03-02

ICLR.cc/2024/Workshop/Re-Align (poster)

Learning and Aligning Structured Random Feature Networks

Vivian White

Muawiz Sajjad Chaudhary

Guy Wolf

Kameron Decker Harris

Artificial neural networks (ANNs) are considered ``black boxes'' due to the difficulty of interpreting their learned weights. While choosin… (see more)g the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to devlop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.

2024-03-02

ICLR.cc/2024/Workshop/Re-Align (poster)