Portrait de Guillaume Lajoie

Guillaume Lajoie

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département de mathématiques et statistiques
Chercheur invité, Google
Sujets de recherche
Apprentissage de représentations
Apprentissage profond
Cognition
IA en santé
IA pour la science
Neurosciences computationnelles
Optimisation
Raisonnement
Réseaux de neurones récurrents
Systèmes dynamiques

Biographie

Guillaume Lajoie est professeur agrégé au Département de mathématiques et de statistiques (DMS) de l'Université de Montréal et membre académique principal de Mila – Institut québécois d’intelligence artificielle. Il est titulaire d'une chaire CIFAR (CCAI Canada) ainsi que d'une chaire de recherche du Canada (CRC) en calcul et interfaçage neuronaux.

Ses recherches sont positionnées à l'intersection de l'IA et des neurosciences où il développe des outils pour mieux comprendre les mécanismes d'intelligence communs aux systèmes biologiques et artificiels. Les contributions de son groupe de recherche vont des progrès des paradigmes d'apprentissage à plusieurs échelles pour les grands systèmes artificiels aux applications en neurotechnologie. Dr. Lajoie participe activement aux efforts de développement responsables de l'IA, cherchant à identifier les lignes directrices et les meilleures pratiques pour l'utilisation de l'IA dans la recherche et au-delà.

Étudiants actuels

Collaborateur·rice de recherche - ETH Zurich
Collaborateur·rice alumni - Polytechnique
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Postdoctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - Polytechnique
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Stagiaire de recherche - Concordia
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Visiteur de recherche indépendant - UdeM
Maîtrise recherche - UdeM
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Visiteur de recherche indépendant - University of South California

Publications

Discrete, compositional, and symbolic representations through attractor dynamics
Andrew Nam
Nikolay Malkin
Chen Sun
Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite ca… (voir plus)pacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractor-supported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information.
Online Bayesian Optimization of Nerve Stimulation
Lorenz Wernisch
Tristan Edwards
Antonin Berthon
Elvijs Sarkans
Myrta Stoukidi
Pascal Fortier-Poisson
Max Pinkney
Michael Thornton
Catherine Hanley
Susannah Lee
Joel Jennings
Ben Appleton
Phillip Garsed
Bret Patterson
Buttinger Will
Samuel Gonshaw
Matjaž Jakopec
Sudhakaran Shunmugam
Aleksi Tukiainen
Oliver Armitage
Emil Hewage
In bioelectronic medicine, neuromodulation therapies induce neural signals to the brain or organs modifying their function. Stimulation devi… (voir plus)ces, capable of triggering exogenous neural signals using electrical wave forms, require a complex and multi-dimensional parameter space in order to control such wave forms. Determining the best combination of parameters (wave form optimization, or dosing) for treating a particular patient’s illness is therefore challenging. Comprehensive parameter searching for an optimal stimulation effect is often infeasible in a clinical setting, due to the size of the parameter space. Restricting this space, however, may lead to sub-optimal therapeutic results, reduced responder rates, and adverse effects. As an alternative to a full parameter search, we present a flexible machine learning, data acquisition and processing framework for optimizing neural stimulation parameters requiring as few steps as possible using Bayesian optimization. Such optimization builds a model of the neural and physiological responses to stimulations enabling it to optimize stimulation parameters and to provide estimates of the accuracy of the response model. The vagus nerve innervates, among other thoracic and visceral organs, the heart, thus controlling heart rate and is therefore ideal for demonstrating the effectiveness of our approach. Main results. The efficacy of our optimization approach was first evaluated on simulated neural responses, then applied to vagus nerve stimulation intraoperatively in porcine subjects. Optimization converged quickly on parameters achieving target heart rates and optimizing neural B-fibre activations despite high intersubject variability. An optimized stimulation waveform was achieved in real time with far fewer stimulations than required by alternative optimization strategies, thus minimizing exposure to side effects. Uncertainty estimates helped avoiding stimulations outside a safe range. Our approach shows that a complex set of neural stimulation parameters can be optimized in real-time for a patient to achieve a personalized precision dosing.
Flexible Phase Dynamics for Bio-Plausible Contrastive Learning
Many learning algorithms used as normative models in neuroscience or as candidate approaches for learning on neuromorphic chips learn by con… (voir plus)trasting one set of network states with another. These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics that could limit the range of physical systems capable of harnessing CL. In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed. Thanks to a set of general theorems corroborated by numerical experiments across several CL models, our results provide theoretical foundations for the study and development of CL methods for biological and neuromorphic neural networks.
Exploring Exchangeable Dataset Amortization for Bayesian Posterior Inference
Niels Leif Bracher
Priyank Jaini
Marcus A Brubaker
Bayesian inference provides a natural way of incorporating uncertainties and different underlying theories when making predictions or analyz… (voir plus)ing complex systems. However, it requires computationally expensive routines for approximation, which have to be re-run when new data is observed and are thus infeasible to efficiently scale and reuse. In this work, we look at the problem from the perspective of amortized inference to obtain posterior parameter distributions for known probabilistic models. We propose a neural network-based approach that can handle exchangeable observations and amortize over datasets to convert the problem of Bayesian posterior inference into a single forward pass of a network. Our empirical analyses explore various design choices for amortized inference by comparing: (a) our proposed variational objective with forward KL minimization, (b) permutation-invariant architectures like Transformers and DeepSets, and (c) parameterizations of posterior families like diagonal Gaussian and Normalizing Flows. Through our experiments, we successfully apply amortization techniques to estimate the posterior distributions for different domains solely through inference.
Learning to Optimize with Recurrent Hierarchical Transformers
Use of Invasive Brain-Computer Interfaces in Pediatric Neurosurgery: Technical and Ethical Considerations
David Bergeron
Christian Iorio-Morin
Nathalie Orr Gaucher
Éric Racine
Alexander G. Weil
Autonomous optimization of neuroprosthetic stimulation parameters that drive the motor cortex and spinal cord outputs in rats and monkeys
Sandrine L. Côté
Elena Massai
Parikshat Sirpal
Stephan Quessy
Marina Martinez
Numa Dancause
Neural stimulation can alleviate paralysis and sensory deficits. Novel high-density neural interfaces can enable refined and multipronged ne… (voir plus)urostimulation interventions. To achieve this, it is essential to develop algorithmic frameworks capable of handling optimization in large parameter spaces. Here, we leveraged an algorithmic class, Gaussian-process (GP)-based Bayesian optimization (BO), to solve this problem. We show that GP-BO efficiently explores the neurostimulation space, outperforming other search strategies after testing only a fraction of the possible combinations. Through a series of real-time multi-dimensional neurostimulation experiments, we demonstrate optimization across diverse biological targets (brain, spinal cord), animal models (rats, non-human primates), in healthy subjects, and in neuroprosthetic intervention after injury, for both immediate and continual learning over multiple sessions. GP-BO can embed and improve “prior” expert/clinical knowledge to dramatically enhance its performance. These results advocate for broader establishment of learning agents as structural elements of neuroprosthetic design, enabling personalization and maximization of therapeutic effectiveness.
Neural manifolds and learning regimes in neural-interface tasks
Neural activity tends to reside on manifolds whose dimension is lower than the dimension of the whole neural state space. Experiments using … (voir plus)brain-computer interfaces (BCIs) with microelectrode arrays implanted in the motor cortex of nonhuman primates have provided ways to test whether neural manifolds influence learning-related neural computations. Starting from a learned BCI-controlled motor task, these experiments explored the effect of changing the BCI decoder to implement perturbations that were either “aligned” or not with the pre-existing neural manifold. In a series of studies, researchers found that within-manifold perturbations (WMPs) evoked fast reassociations of existing neural patterns for rapid adaptation, while outside-manifold perturbations (OMPs) triggered a slower adaptation process that led to the emergence of new neural patterns. Together, these findings have been interpreted as suggesting that these different rates of adaptation might be associated with distinct learning mechanisms. Here, we investigated whether gradient-descent learning could alone explain these differences. Using an idealized model that captures the fixed-point dynamics of recurrent neural networks, we uncovered gradient-based learning dynamics consistent with experimental findings. Crucially, this experimental match arose only when the network was initialized in a lazier learning regime, a concept inherited from deep learning theory. A lazy learning regime—in contrast with a rich regime—implies small changes on synaptic strengths throughout learning. For OMPs, these small changes were less effective at increasing performance and could lead to unstable adaptation with a heightened sensitivity to learning rates. For WMPs, they helped reproduce the reassociation mechanism on short adaptation time scales, especially with large input variances. Since gradient descent has many biologically plausible variants, our findings establish lazy gradient-based learning as a plausible mechanism for adaptation under network-level constraints and unify several experimental results from the literature.
Transfer Entropy Bottleneck: Learning Sequence to Sequence Information Transfer
Pascal Fortier-Poisson
Blake Aaron Richards
When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream… (voir plus)) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challenge when modelling such data is that it is easy for a neural network to rely on the greatest joint correlations within the target stream, which may ignore a crucial but small information transfer from the source to the target stream. As well, there are often situations where the target stream may have previously been modelled independently and it would be useful to use that model to inform a new joint model. Here, we develop an information bottleneck approach for conditional learning on two dependent streams of data. Our method, which we call Transfer Entropy Bottleneck (TEB), allows one to learn a model that bottlenecks the directed information transferred from the source variable to the target variable, while quantifying this information transfer within the model. As such, TEB provides a useful new information bottleneck approach for modelling two statistically dependent streams of data in order to make predictions about one of them.
Steerable Equivariant Representation Learning
Willie McClinton
Tongzhou Wang
Chen Sun
Phillip Isola
Dilip Krishnan
Pre-trained deep image representations are useful for post-training tasks such as classification through transfer learning, image retrieval,… (voir plus) and object detection. Data augmentations are a crucial aspect of pre-training robust representations in both supervised and self-supervised settings. Data augmentations explicitly or implicitly promote invariance in the embedding space to the input image transformations. This invariance reduces generalization to those downstream tasks which rely on sensitivity to these particular data augmentations. In this paper, we propose a method of learning representations that are instead equivariant to data augmentations. We achieve this equivariance through the use of steerable representations. Our representations can be manipulated directly in embedding space via learned linear maps. We demonstrate that our resulting steerable and equivariant representations lead to better performance on transfer learning and robustness: e.g. we improve linear probe top-1 accuracy by between 1% to 3% for transfer; and ImageNet-C accuracy by upto 3.4%. We further show that the steerability of our representations provides significant speedup (nearly 50x) for test-time augmentations; by applying a large number of augmentations for out-of-distribution detection, we significantly improve OOD AUC on the ImageNet-C dataset over an invariant representation.
How Gradient Estimator Variance and Bias Could Impact Learning in Neural Circuits
Yuhan Helena Liu
Konrad Kording
Blake A. Richards
Reliability of CKA as a Similarity Measure in Deep Learning
Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different w… (voir plus)ays. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of conclusions about similarity and dissimilarity of these various representations have been made using CKA. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation of CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counter-intuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.