Guillaume Lajoie

Biographie

Guillaume Lajoie est professeur agrégé au Département de mathématiques et de statistiques (DMS) de l'Université de Montréal et membre académique principal de Mila – Institut québécois d’intelligence artificielle. Il est titulaire d'une chaire CIFAR (CCAI Canada) ainsi que d'une chaire de recherche du Canada (CRC) en calcul et interfaçage neuronaux.

Ses recherches sont positionnées à l'intersection de l'IA et des neurosciences où il développe des outils pour mieux comprendre les mécanismes d'intelligence communs aux systèmes biologiques et artificiels. Les contributions de son groupe de recherche vont des progrès des paradigmes d'apprentissage à plusieurs échelles pour les grands systèmes artificiels aux applications en neurotechnologie. Dr. Lajoie participe activement aux efforts de développement responsables de l'IA, cherchant à identifier les lignes directrices et les meilleures pratiques pour l'utilisation de l'IA dans la recherche et au-delà.

Étudiants actuels

Federico Arangath Joseph

Collaborateur·rice de recherche - ETH Zurich

Stefan Bauer

Visiteur de recherche indépendant

Superviseur⋅e principal⋅e :

Yoshua Bengio

Sangnie Bhardwaj

Doctorat - UdeM

Co-superviseur⋅e :

Hugo Larochelle

Colin Bredenberg

Postdoctorat - UdeM

Co-superviseur⋅e :

Blake Richards

Leo Choiniere

Doctorat - UdeM

Olivier Codol

Postdoctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Leo Gagnon

Doctorat - UdeM

Juan Guerra

Maîtrise recherche - Polytechnique

Superviseur⋅e principal⋅e :

Marco Bonizzato

Site web

Nanda Harishankar Krishna

Doctorat - UdeM

Collaborateur·rice de recherche - Western Washington University (faculty; assistant prof))

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Dhanya Sridhar

tejaskasetty@gmail.com

Site web

Ximeng Mao

Doctorat - UdeM

Co-superviseur⋅e :

Joelle Pineau

Abdel Mfougouon Njupoun

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Amine Natik

Doctorat - UdeM

Co-superviseur⋅e :

Guy Wolf

Alexandre Payeur

Collaborateur·rice de recherche - UdeM

Mohammad Pezeshki

Collaborateur·rice de recherche

Superviseur⋅e principal⋅e :

Postdoctorat - McGill

Superviseur⋅e principal⋅e :

Julia Price

Maîtrise recherche - UdeM

Param Raval

Collaborateur·rice alumni - UdeM

Avery Ryoo

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Lune Bellec

Ayesha Vermani

Visiteur de recherche indépendant - Champalimeau Institute for the Unknown

Ryan Vogt

Postdoctorat - UdeM

Que nous apprennent les distributions des coefficients synaptiques au sujet de l’apprentissage dans le cerveau ?

Vivian White

Stagiaire de recherche - Western Washington University

Co-superviseur⋅e :

Doctorat - UdeM

Billets de blogue

13 juin 2024

par

Roman Pogodin

Jonathan Cornford

Arna Ghosh

Gauthier Gidel

Guillaume Lajoie

Blake Richards

Lire l'article

Publications

Neural networks with optimized single-neuron adaptation uncover biologically plausible regularization

Victor Geadah

Stefan Horoi

Giancarlo Kerg

Guy Wolf

Neurons in the brain have rich and adaptive input-output properties. Features such as heterogeneous f-I curves and spike frequency adaptatio… (voir plus)n are known to place single neurons in optimal coding regimes when facing changing stimuli. Yet, it is still unclear how brain circuits exploit single-neuron flexibility, and how network-level requirements may have shaped such cellular function. To answer this question, a multi-scaled approach is needed where the computations of single neurons and neural circuits must be considered as a complete system. In this work, we use artificial neural networks to systematically investigate single-neuron input-output adaptive mechanisms, optimized in an end-to-end fashion. Throughout the optimization process, each neuron has the liberty to modify its nonlinear activation function, parametrized to mimic f-I curves of biological neurons, and to learn adaptation strategies to modify activation functions in real-time during a task. We find that such networks show much-improved robustness to noise and changes in input statistics. Importantly, we find that this procedure recovers precise coding strategies found in biological neurons, such as gain scaling and fractional order differentiation/integration. Using tools from dynamical systems theory, we analyze the role of these emergent single-neuron properties and argue that neural diversity and adaptation play an active regularization role, enabling neural circuits to optimally propagate information across time.

2024-12-13

PLOS Computational Biology (publié)

Brain-like learning with exponentiated gradients

Jonathan Cornford

Roman Pogodin

Arna Ghosh

Kaiwen Sheng

Brendan A. Bicknell

Olivier Codol

Beverley A. Clark

Blake Richards

2024-10-26

bioRxiv (prépublication)

A Complexity-Based Theory of Compositionality

Eric Elmoznino

Thomas Jiralerspong

Yoshua Bengio

2024-10-18

ArXiv (prépublication)

A Complexity-Based Theory of Compositionality

Eric Elmoznino

Thomas Jiralerspong

Yoshua Bengio

Compositionality is believed to be fundamental to intelligence. In humans, it underlies the structure of thought, language, and higher-level… (voir plus) reasoning. In AI, compositional representations can enable a powerful form of out-of-distribution generalization, in which a model systematically adapts to novel combinations of known concepts. However, while we have strong intuitions about what compositionality is, there currently exists no formal definition for it that is measurable and mathematical. Here, we propose such a definition, which we call representational compositionality, that accounts for and extends our intuitions about compositionality. The definition is conceptually simple, quantitative, grounded in algorithmic information theory, and applicable to any representation. Intuitively, representational compositionality states that a compositional representation satisfies three properties. First, it must be expressive. Second, it must be possible to re-describe the representation as a function of discrete symbolic sequences with re-combinable parts, analogous to sentences in natural language. Third, the function that relates these symbolic sequences to the representation, analogous to semantics in natural language, must be simple. Through experiments on both synthetic and real world data, we validate our definition of compositionality and show how it unifies disparate intuitions from across the literature in both AI and cognitive science. We also show that representational compositionality, while theoretically intractable, can be readily estimated using standard deep learning tools. Our definition has the potential to inspire the design of novel, theoretically-driven models that better capture the mechanisms of compositional thought.

2024-10-18

ArXiv (prépublication)

In-context learning and Occam's razor

Eric Elmoznino

Tom Marty

Tejas Kasetty

Leo Gagnon

Sarthak Mittal

Mahan Fathi

Dhanya Sridhar

A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees fo… (voir plus)r generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.

2024-10-17

ArXiv (prépublication)

openreview.net

Learning Stochastic Rainbow Networks

Vivian White

Muawiz Sajjad Chaudhary

Guy Wolf

Kameron Decker Harris

Random feature models are a popular approach for studying network learning that can capture important behaviors while remaining simpler than… (voir plus) traditional training. Guth et al. [2024] introduced “rainbow” networks which model the distribution of trained weights as correlated random features conditioned on previous layer activity. Sampling new weights from distributions fit to learned networks led to similar performance in entirely untrained networks, and the observed weight covariance were found to be low rank. This provided evidence that random feature models could be extended to some networks away from initialization, but White et al. [2024] failed to replicate their results in the deeper ResNet18 architecture. Here we ask whether the rainbow formulation can succeed in deeper networks by directly training a stochastic ensemble of random features, which we call stochastic rainbow networks. At every gradient descent iteration, new weights are sampled for all intermediate layers and features aligned layer-wise. We find: (1) this approach scales to deeper models, which outperform shallow networks at large widths; (2) ensembling multiple samples from the stochastic model is better than retraining the classifier head; and (3) low-rank parameterization of the learnable weight covariances can approach the accuracy of full-rank networks. This offers more evidence for rainbow and other structured random feature networks as reduced models of deep learning.

2024-10-10

NeurIPS.cc/2024/Workshop/SciForDL (poster)

openreview.net

Brain-like neural dynamics for behavioral control develop through reinforcement learning

Olivier Codol

Nanda H Krishna

M.G. Perich

During development, neural circuits are shaped continuously as we learn to control our bodies. The ultimate goal of this process is to produ… (voir plus)ce neural dynamics that enable the rich repertoire of behaviors we perform with our limbs. What begins as a series of “babbles” coalesces into skilled motor output as the brain rapidly learns to control the body. However, the nature of the teaching signal underlying this normative learning process remains elusive. Here, we test two well-established and biologically plausible theories—supervised learning (SL) and reinforcement learning (RL)—that could explain how neural circuits develop the capacity for skilled movements. We trained recurrent neural networks to control a biomechanical model of a primate arm using either SL or RL and compared the resulting neural dynamics to populations of neurons recorded from the motor cortex of monkeys performing the same movements. Intriguingly, only RL-trained networks produced neural activity that matched their biological counterparts in terms of both the geometry and dynamics of population activity. We show that the similarity between RL-trained networks and biological brains depends critically on matching biomechanical properties of the limb. We then demonstrated that monkeys and RL-trained networks, but not SL-trained networks, show a strikingly similar capacity for robust short-term behavioral adaptation to a movement perturbation, indicating a fundamental and general commonality in the neural control policy. Together, our results support the hypothesis that neural dynamics for behavioral control emerge through a process akin to reinforcement learning. The resulting neural circuits offer numerous advantages for adaptable behavioral control over simpler and more efficient learning rules and expand our understanding of how developmental processes shape neural dynamics.

2024-10-06

bioRxiv (prépublication)

The oneirogen hypothesis: modeling the hallucinatory effects of classical psychedelics in terms of replay-dependent plasticity mechanisms

Colin Bredenberg

Fabrice Normandin

Blake Richards

2024-09-30

bioRxiv (prépublication)

Latent Representation Learning for Multimodal Brain Activity Translation

Arman Afrasiyabi

Dhananjay Bhaskar

Erica Lindsey Busch

Laurent Caplette

Rahul Singh

Nicholas B Turk-Browne

Smita Krishnaswamy

Neuroscience employs diverse neuroimaging techniques, each offering distinct insights into brain activity, from electrophysiological recordi… (voir plus)ngs such as EEG, which have high temporal resolution, to hemodynamic modalities such as fMRI, which have increased spatial precision. However, integrating these heterogeneous data sources remains a challenge, which limits a comprehensive understanding of brain function. We present the Spatiotemporal Alignment of Multimodal Brain Activity (SAMBA) framework, which bridges the spatial and temporal resolution gaps across modalities by learning a unified latent space free of modality-specific biases. SAMBA introduces a novel attention-based wavelet decomposition for spectral filtering of electrophysiological recordings, graph attention networks to model functional connectivity between functional brain units, and recurrent layers to capture temporal autocorrelations in brain signal. We show that the training of SAMBA, aside from achieving translation, also learns a rich representation of brain information processing. We showcase this classify external stimuli driving brain activity from the representation learned in hidden layers of SAMBA, paving the way for broad downstream applications in neuroscience research and clinical contexts.

2024-09-27

ArXiv (prépublication)

Latent Representation Learning for Multimodal Brain Activity Translation

Arman Afrasiyabi

Dhananjay Bhaskar

Erica L. Busch

Laurent Caplette

Rahul Singh

Nicholas B. Turk-Browne

Smita Krishnaswamy

2024-09-27

ArXiv (prépublication)

Accelerating Training with Neuron Interaction and Nowcasting Networks

Boris Knyazev

Abhinav Moudgil

Eugene Belilovsky

Simon Lacoste-Julien

Neural network training can be accelerated when a learnable update rule is used in lieu of classic adaptive optimizers (e.g. Adam). However,… (voir plus) learnable update rules can be costly and unstable to train and use. Recently, Jang et al. (2023) proposed a simpler approach to accelerate training based on weight nowcaster networks (WNNs). In their approach, Adam is used for most of the optimization steps and periodically, only every few steps, a WNN nowcasts (predicts near future) parameters. We improve WNNs by proposing neuron interaction and nowcasting (NiNo) networks. In contrast to WNNs, NiNo leverages neuron connectivity and graph neural networks to more accurately nowcast parameters. We further show that in some networks, such as Transformers, modeling neuron connectivity accurately is challenging. We address this and other limitations, which allows NiNo to accelerate Adam training by up to 50% in vision and language tasks.

2024-09-06

ArXiv (prépublication)

When can transformers compositionally generalize in-context?

Seijin Kobayashi

Simon Schug

Yassir Akram

Florian Redhardt

Johannes Von Oswald

Razvan Pascanu

João Sacramento

Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of w… (voir plus)hich might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution.

2024-07-17

ArXiv (prépublication)