Portrait of Guillaume Lajoie

Guillaume Lajoie

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, Université de Montréal, Department of Mathematics and Statistics
Visiting Researcher, Google
Research Topics
AI for Science
AI in Health
Cognition
Computational Neuroscience
Deep Learning
Dynamical Systems
Optimization
Reasoning
Recurrent Neural Networks
Representation Learning

Biography

Guillaume Lajoie is an Associate professor in the Department of Mathematics and Statistics at Université de Montréal and a Core Academic Member of Mila – Quebec Artificial Intelligence Institute. He holds a Canada-CIFAR AI Research Chair, and a Canada Research Chair (CRC) in Neural Computation and Interfacing.

His research is positioned at the intersection of AI and Neuroscience where he develops tools to better understand mechanisms of intelligence common to both biological and artificial systems. His research group's contributions range from advances in multi-scale learning paradigms for large artificial systems, to applications in neurotechnology. Dr. Lajoie is actively involved in responsible AI development efforts, seeking to identify guidelines and best practices for use of AI in research and beyond.

Current Students

Collaborating researcher - ETH Zurich
Collaborating Alumni - Polytechnique Montréal
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Postdoctorate - McGill University
Principal supervisor :
Master's Research - Polytechnique Montréal
Principal supervisor :
PhD - Université de Montréal
Independent visiting researcher - McGill University
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
Co-supervisor :
PhD - McGill University
Principal supervisor :
Research Intern - Concordia University
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
Independent visiting researcher - Université de Montréal
Master's Research - Université de Montréal
Master's Research - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
PhD - Université de Montréal
Independent visiting researcher - University of South California

Publications

Discrete, compositional, and symbolic representations through attractor dynamics
Andrew Nam
Nikolay Malkin
Chen Sun
Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite ca… (see more)pacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractor-supported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information.
Online Bayesian Optimization of Nerve Stimulation
Lorenz Wernisch
Tristan Edwards
Antonin Berthon
Elvijs Sarkans
Myrta Stoukidi
Pascal Fortier-Poisson
Max Pinkney
Michael Thornton
Catherine Hanley
Susannah Lee
Joel Jennings
Ben Appleton
Phillip Garsed
Bret Patterson
Buttinger Will
Samuel Gonshaw
Matjaž Jakopec
Sudhakaran Shunmugam
Aleksi Tukiainen
Oliver Armitage
Emil Hewage
In bioelectronic medicine, neuromodulation therapies induce neural signals to the brain or organs modifying their function. Stimulation devi… (see more)ces, capable of triggering exogenous neural signals using electrical wave forms, require a complex and multi-dimensional parameter space in order to control such wave forms. Determining the best combination of parameters (wave form optimization, or dosing) for treating a particular patient’s illness is therefore challenging. Comprehensive parameter searching for an optimal stimulation effect is often infeasible in a clinical setting, due to the size of the parameter space. Restricting this space, however, may lead to sub-optimal therapeutic results, reduced responder rates, and adverse effects. As an alternative to a full parameter search, we present a flexible machine learning, data acquisition and processing framework for optimizing neural stimulation parameters requiring as few steps as possible using Bayesian optimization. Such optimization builds a model of the neural and physiological responses to stimulations enabling it to optimize stimulation parameters and to provide estimates of the accuracy of the response model. The vagus nerve innervates, among other thoracic and visceral organs, the heart, thus controlling heart rate and is therefore ideal for demonstrating the effectiveness of our approach. Main results. The efficacy of our optimization approach was first evaluated on simulated neural responses, then applied to vagus nerve stimulation intraoperatively in porcine subjects. Optimization converged quickly on parameters achieving target heart rates and optimizing neural B-fibre activations despite high intersubject variability. An optimized stimulation waveform was achieved in real time with far fewer stimulations than required by alternative optimization strategies, thus minimizing exposure to side effects. Uncertainty estimates helped avoiding stimulations outside a safe range. Our approach shows that a complex set of neural stimulation parameters can be optimized in real-time for a patient to achieve a personalized precision dosing.
Flexible Phase Dynamics for Bio-Plausible Contrastive Learning
Many learning algorithms used as normative models in neuroscience or as candidate approaches for learning on neuromorphic chips learn by con… (see more)trasting one set of network states with another. These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics that could limit the range of physical systems capable of harnessing CL. In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed. Thanks to a set of general theorems corroborated by numerical experiments across several CL models, our results provide theoretical foundations for the study and development of CL methods for biological and neuromorphic neural networks.
Exploring Exchangeable Dataset Amortization for Bayesian Posterior Inference
Niels Leif Bracher
Priyank Jaini
Marcus A Brubaker
Bayesian inference provides a natural way of incorporating uncertainties and different underlying theories when making predictions or analyz… (see more)ing complex systems. However, it requires computationally expensive routines for approximation, which have to be re-run when new data is observed and are thus infeasible to efficiently scale and reuse. In this work, we look at the problem from the perspective of amortized inference to obtain posterior parameter distributions for known probabilistic models. We propose a neural network-based approach that can handle exchangeable observations and amortize over datasets to convert the problem of Bayesian posterior inference into a single forward pass of a network. Our empirical analyses explore various design choices for amortized inference by comparing: (a) our proposed variational objective with forward KL minimization, (b) permutation-invariant architectures like Transformers and DeepSets, and (c) parameterizations of posterior families like diagonal Gaussian and Normalizing Flows. Through our experiments, we successfully apply amortization techniques to estimate the posterior distributions for different domains solely through inference.
Learning to Optimize with Recurrent Hierarchical Transformers
Use of Invasive Brain-Computer Interfaces in Pediatric Neurosurgery: Technical and Ethical Considerations
David Bergeron
Christian Iorio-Morin
Nathalie Orr Gaucher
Éric Racine
Alexander G. Weil
Autonomous optimization of neuroprosthetic stimulation parameters that drive the motor cortex and spinal cord outputs in rats and monkeys
Sandrine L. Côté
Elena Massai
Parikshat Sirpal
Stephan Quessy
Marina Martinez
Numa Dancause
Neural stimulation can alleviate paralysis and sensory deficits. Novel high-density neural interfaces can enable refined and multipronged ne… (see more)urostimulation interventions. To achieve this, it is essential to develop algorithmic frameworks capable of handling optimization in large parameter spaces. Here, we leveraged an algorithmic class, Gaussian-process (GP)-based Bayesian optimization (BO), to solve this problem. We show that GP-BO efficiently explores the neurostimulation space, outperforming other search strategies after testing only a fraction of the possible combinations. Through a series of real-time multi-dimensional neurostimulation experiments, we demonstrate optimization across diverse biological targets (brain, spinal cord), animal models (rats, non-human primates), in healthy subjects, and in neuroprosthetic intervention after injury, for both immediate and continual learning over multiple sessions. GP-BO can embed and improve “prior” expert/clinical knowledge to dramatically enhance its performance. These results advocate for broader establishment of learning agents as structural elements of neuroprosthetic design, enabling personalization and maximization of therapeutic effectiveness.
Neural manifolds and learning regimes in neural-interface tasks
Neural activity tends to reside on manifolds whose dimension is lower than the dimension of the whole neural state space. Experiments using … (see more)brain-computer interfaces (BCIs) with microelectrode arrays implanted in the motor cortex of nonhuman primates have provided ways to test whether neural manifolds influence learning-related neural computations. Starting from a learned BCI-controlled motor task, these experiments explored the effect of changing the BCI decoder to implement perturbations that were either “aligned” or not with the pre-existing neural manifold. In a series of studies, researchers found that within-manifold perturbations (WMPs) evoked fast reassociations of existing neural patterns for rapid adaptation, while outside-manifold perturbations (OMPs) triggered a slower adaptation process that led to the emergence of new neural patterns. Together, these findings have been interpreted as suggesting that these different rates of adaptation might be associated with distinct learning mechanisms. Here, we investigated whether gradient-descent learning could alone explain these differences. Using an idealized model that captures the fixed-point dynamics of recurrent neural networks, we uncovered gradient-based learning dynamics consistent with experimental findings. Crucially, this experimental match arose only when the network was initialized in a lazier learning regime, a concept inherited from deep learning theory. A lazy learning regime—in contrast with a rich regime—implies small changes on synaptic strengths throughout learning. For OMPs, these small changes were less effective at increasing performance and could lead to unstable adaptation with a heightened sensitivity to learning rates. For WMPs, they helped reproduce the reassociation mechanism on short adaptation time scales, especially with large input variances. Since gradient descent has many biologically plausible variants, our findings establish lazy gradient-based learning as a plausible mechanism for adaptation under network-level constraints and unify several experimental results from the literature.
Transfer Entropy Bottleneck: Learning Sequence to Sequence Information Transfer
Pascal Fortier-Poisson
Blake Aaron Richards
When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream… (see more)) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challenge when modelling such data is that it is easy for a neural network to rely on the greatest joint correlations within the target stream, which may ignore a crucial but small information transfer from the source to the target stream. As well, there are often situations where the target stream may have previously been modelled independently and it would be useful to use that model to inform a new joint model. Here, we develop an information bottleneck approach for conditional learning on two dependent streams of data. Our method, which we call Transfer Entropy Bottleneck (TEB), allows one to learn a model that bottlenecks the directed information transferred from the source variable to the target variable, while quantifying this information transfer within the model. As such, TEB provides a useful new information bottleneck approach for modelling two statistically dependent streams of data in order to make predictions about one of them.
Steerable Equivariant Representation Learning
Willie McClinton
Tongzhou Wang
Chen Sun
Phillip Isola
Dilip Krishnan
Pre-trained deep image representations are useful for post-training tasks such as classification through transfer learning, image retrieval,… (see more) and object detection. Data augmentations are a crucial aspect of pre-training robust representations in both supervised and self-supervised settings. Data augmentations explicitly or implicitly promote invariance in the embedding space to the input image transformations. This invariance reduces generalization to those downstream tasks which rely on sensitivity to these particular data augmentations. In this paper, we propose a method of learning representations that are instead equivariant to data augmentations. We achieve this equivariance through the use of steerable representations. Our representations can be manipulated directly in embedding space via learned linear maps. We demonstrate that our resulting steerable and equivariant representations lead to better performance on transfer learning and robustness: e.g. we improve linear probe top-1 accuracy by between 1% to 3% for transfer; and ImageNet-C accuracy by upto 3.4%. We further show that the steerability of our representations provides significant speedup (nearly 50x) for test-time augmentations; by applying a large number of augmentations for out-of-distribution detection, we significantly improve OOD AUC on the ImageNet-C dataset over an invariant representation.
How Gradient Estimator Variance and Bias Could Impact Learning in Neural Circuits
Yuhan Helena Liu
Konrad Kording
Blake A. Richards
Reliability of CKA as a Similarity Measure in Deep Learning
Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different w… (see more)ays. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of conclusions about similarity and dissimilarity of these various representations have been made using CKA. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation of CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counter-intuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.