Guillaume Lajoie

Biographie

Guillaume Lajoie est professeur agrégé au Département de mathématiques et de statistiques (DMS) de l'Université de Montréal et membre académique principal de Mila – Institut québécois d’intelligence artificielle. Il est titulaire d'une chaire CIFAR (CCAI Canada) ainsi que d'une chaire de recherche du Canada (CRC) en calcul et interfaçage neuronaux.

Ses recherches sont positionnées à l'intersection de l'IA et des neurosciences où il développe des outils pour mieux comprendre les mécanismes d'intelligence communs aux systèmes biologiques et artificiels. Les contributions de son groupe de recherche vont des progrès des paradigmes d'apprentissage à plusieurs échelles pour les grands systèmes artificiels aux applications en neurotechnologie. Dr. Lajoie participe activement aux efforts de développement responsables de l'IA, cherchant à identifier les lignes directrices et les meilleures pratiques pour l'utilisation de l'IA dans la recherche et au-delà.

Étudiants actuels

Federico Arangath Joseph

Collaborateur·rice de recherche - ETH Zurich

Stefan Bauer

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Yoshua Bengio

Sangnie Bhardwaj

Doctorat - UdeM

Co-superviseur⋅e :

Hugo Larochelle

Colin Bredenberg

Postdoctorat - UdeM

Co-superviseur⋅e :

Blake Richards

Leo Choiniere

Doctorat - UdeM

Olivier Codol

Postdoctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Leo Gagnon

Doctorat - UdeM

Skylar Gu

Stagiaire de recherche - McGill

Superviseur⋅e principal⋅e :

Juan Guerra

Maîtrise recherche - Polytechnique

Superviseur⋅e principal⋅e :

Marco Bonizzato

Site web

Nanda Harishankar Krishna

Doctorat - UdeM

Collaborateur·rice de recherche - Western Washington University (faculty; assistant prof))

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Co-superviseur⋅e :

tejaskasetty@gmail.com

Site web

Ximeng Mao

Doctorat - UdeM

Co-superviseur⋅e :

Joelle Pineau

Abdel Mfougouon Njupoun

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Amine Natik

Doctorat - UdeM

Co-superviseur⋅e :

Guy Wolf

Alexandre Payeur

Collaborateur·rice de recherche - UdeM

Mohammad Pezeshki

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Julia Price

Maîtrise recherche - UdeM

Param Raval

Collaborateur·rice alumni - UdeM

Avery Ryoo

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Lune Bellec

Ayesha Vermani

Visiteur de recherche indépendant - Champalimeau Institute for the Unknown

Ryan Vogt

Postdoctorat - UdeM

Apprentissage automatique pour la segmentation des différentes activations des fibres nerveuses à partir des signaux neuronaux du cerveau vers le corps

Vivian White

Stagiaire de recherche - Western Washington University

Co-superviseur⋅e :

Doctorat - UdeM

Billets de blogue

Représentation graphique d'un nerf vague

21 mai 2025

par

Param Raval

Olivier Tessier-Larivière

Pascal Fortier-Poisson

Blake Richards

Guillaume Lajoie

Lire l'article

13 juin 2024

Que nous apprennent les distributions des coefficients synaptiques au sujet de l’apprentissage dans le cerveau ?

par

Roman Pogodin

Jonathan Cornford

Arna Ghosh

Gauthier Gidel

Guillaume Lajoie

Blake Richards

Lire l'article

Publications

When can transformers compositionally generalize in-context?

Seijin Kobayashi

Simon Schug

Yassir Akram

Florian Redhardt

Johannes Von Oswald

Razvan Pascanu

João Sacramento

Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of w… (voir plus)hich might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution.

2024-07-17

ArXiv (prépublication)

arxiv.org

A benchmark of individual auto-regressive models in a massive fMRI dataset

Fraçois Paugam

Basile Pinsard

Lune Bellec

Abstract Dense functional magnetic resonance imaging datasets open new avenues to create auto-regressive models of brain activity. Individua… (voir plus)l idiosyncrasies are obscured by group models, but can be captured by purely individual models given sufficient amounts of training data. In this study, we compared several deep and shallow individual models on the temporal auto-regression of BOLD time-series recorded during a natural video-watching task. The best performing models were then analyzed in terms of their data requirements and scaling, subject specificity, and the space-time structure of their predicted dynamics. We found the Chebnets, a type of graph convolutional neural network, to be best suited for temporal BOLD auto-regression, closely followed by linear models. Chebnets demonstrated an increase in performance with increasing amounts of data, with no complete saturation at 9 h of training data. Good generalization to other kinds of video stimuli and to resting-state data marked the Chebnets’ ability to capture intrinsic brain dynamics rather than only stimulus-specific autocorrelation patterns. Significant subject specificity was found at short prediction time lags. The Chebnets were found to capture lower frequencies at longer prediction time lags, and the spatial correlations in predicted dynamics were found to match traditional functional connectivity networks. Overall, these results demonstrate that large individual functional magnetic resonance imaging (fMRI) datasets can be used to efficiently train purely individual auto-regressive models of brain activity, and that massive amounts of individual data are required to do so. The excellent performance of the Chebnets likely reflects their ability to combine spatial and temporal interactions on large time scales at a low complexity cost. The non-linearities of the models did not appear as a key advantage. In fact, surprisingly, linear versions of the Chebnets appeared to outperform the original non-linear ones. Individual temporal auto-regressive models have the potential to improve the predictability of the BOLD signal. This study is based on a massive, publicly-available dataset, which can serve for future benchmarks of individual auto-regressive modeling.

2024-07-01

Imaging Neuroscience (publié)

Using neural biomarkers to personalize dosing of vagus nerve stimulation

Antonin Berthon

Lorenz Wernisch

Myrta Stoukidi

Michael Thornton

Olivier Tessier-Lariviere

Pascal Fortier-Poisson

Jorin Mamen

Max Pinkney

Susannah Lee

Elvijs Sarkans

Luca Annecchino

Ben Appleton

Philip Garsed

Bret Patterson

Samuel Gonshaw

Matjaž Jakopec

Sudhakaran Shunmugam

Tristan Edwards

Aleksi Tukiainen

Joel Jennings … (voir 3 de plus)

Emil Hewage

Oliver Armitage

2024-06-17

Bioelectronic Medicine (publié)

Expressivity of Neural Networks with Fixed Weights and Learned Biases

Ezekiel Williams

Avery Hee-Woon Ryoo

Thomas Jiralerspong

Alexandre Payeur

Matt Perich

Luca Mazzucato

2024-06-16

ICML.cc/2024/Workshop/HiLD (poster)

Does learning the right latent variables necessarily improve in-context learning?

Sarthak Mittal

Eric Elmoznino

Leo Gagnon

Sangnie Bhardwaj

Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting ave… (voir plus)nues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.

2024-05-29

ArXiv (prépublication)

Assistive sensory-motor perturbations influence learned neural representations

Pavithra Rajeswaran

Alexandre Payeur

Amy L. Orsborn

Task errors are used to learn and refine motor skills. We investigated how task assistance influences learned neural representations using B… (voir plus)rain-Computer Interfaces (BCIs), which map neural activity into movement via a decoder. We analyzed motor cortex activity as monkeys practiced BCI with a decoder that adapted to improve or maintain performance over days. Population dimensionality remained constant or increased with learning, counter to trends with non-adaptive BCIs. Yet, over time, task information was contained in a smaller subset of neurons or population modes. Moreover, task information was ultimately stored in neural modes that occupied a small fraction of the population variance. An artificial neural network model suggests the adaptive decoders contribute to forming these compact neural representations. Our findings show that assistive decoders manipulate error information used for long-term learning computations, like credit assignment, which informs our understanding of motor learning and has implications for designing real-world BCIs.

2024-03-20

bioRxiv (prépublication)

Online Bayesian optimization of vagus nerve stimulation.

Lorenz Wernisch

Tristan Edwards

Antonin Berthon

Olivier Tessier-Lariviere

Elvijs Sarkans

Myrta Stoukidi

Pascal Fortier-Poisson

Max Pinkney

Michael Thornton

Catherine Hanley

Susannah Lee

Joel Jennings

Ben Appleton

Philip Garsed

Bret Patterson

Buttinger Will

Samuel Gonshaw

Matjaž Jakopec

Sudhakaran Shunmugam

Jorin Mamen … (voir 4 de plus)

Aleksi Tukiainen

Oliver Armitage

Emil Hewage

OBJECTIVE In bioelectronic medicine, neuromodulation therapies induce neural signals to the brain or organs, modifying their function. Stimu… (voir plus)lation devices capable of triggering exogenous neural signals using electrical waveforms require a complex and multi-dimensional parameter space to control such waveforms. Determining the best combination of parameters (waveform optimization or dosing) for treating a particular patient's illness is therefore challenging. Comprehensive parameter searching for an optimal stimulation effect is often infeasible in a clinical setting due to the size of the parameter space. Restricting this space, however, may lead to suboptimal therapeutic results, reduced responder rates, and adverse effects. Approach. As an alternative to a full parameter search, we present a flexible machine learning, data acquisition, and processing framework for optimizing neural stimulation parameters, requiring as few steps as possible using Bayesian optimization. This optimization builds a model of the neural and physiological responses to stimulations, enabling it to optimize stimulation parameters and provide estimates of the accuracy of the response model. The vagus nerve innervates, among other thoracic and visceral organs, the heart, thus controlling heart rate, making it an ideal candidate for demonstrating the effectiveness of our approach. Main results. The efficacy of our optimization approach was first evaluated on simulated neural responses, then applied to vagus nerve stimulation intraoperatively in porcine subjects. Optimization converged quickly on parameters achieving target heart rates and optimizing neural B-fiber activations despite high intersubject variability. Significance. An optimized stimulation waveform was achieved in real time with far fewer stimulations than required by alternative optimization strategies, thus minimizing exposure to side effects. Uncertainty estimates helped avoiding stimulations outside a safe range. Our approach shows that a complex set of neural stimulation parameters can be optimized in real-time for a patient to achieve a personalized precision dosing. .

2024-03-13

Journal of Neural Engineering (publié)

Explicit Knowledge Factorization Meets In-Context Learning: What Do We Gain?

Sarthak Mittal

Eric Elmoznino

Leo Gagnon

Sangnie Bhardwaj

2024-03-05

ICLR.cc/2024/Workshop/R2-FM (poster)

Learning and Aligning Structured Random Feature Networks

Vivian White

Muawiz Sajjad Chaudhary

Guy Wolf

Kameron Decker Harris

Artificial neural networks (ANNs) are considered "black boxes'' due to the difficulty of interpreting their learned weights. While choosing… (voir plus) the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to develop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.

2024-03-02

ICLR.cc/2024/Workshop/Re-Align (poster)

Learning and Aligning Structured Random Feature Networks

Vivian White

Muawiz Sajjad Chaudhary

Guy Wolf

Kameron Decker Harris

Artificial neural networks (ANNs) are considered ``black boxes'' due to the difficulty of interpreting their learned weights. While choosin… (voir plus)g the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to devlop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.

2024-03-02

ICLR.cc/2024/Workshop/Re-Align (poster)