Portrait de Sarath Chandar

Sarath Chandar

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur associé, Polytechnique Montréal, Département d'informatique et de génie logiciel
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Indian Institute of Technology Madras
Sujets de recherche
Alignement de l'IA
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage en ligne
Apprentissage par renforcement
Apprentissage par transfert
Apprentissage profond
Apprentissage tout au long de la vie
Grands modèles de langage (LLM)
IA digne de confiance
Interprétabilité
Modèles de fondation
Optimisation
Réseaux de neurones récurrents
Systèmes multi-agents
Traitement du langage naturel
XAI (IA explicable)

Biographie

Sarath Chandar est professeur associé au départment de génie informatique et génie logiciel de Polytechnique Montréal, où il dirige le laboratoire de recherche Chandar. Il est également membre académique principal à Mila – Institut québécois d’intelligence artificielle, et titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada en apprentissage machine permanent.

Ses recherches portent sur l'apprentissage tout au long de la vie, l'apprentissage profond, l'optimisation, l'apprentissage par renforcement et le traitement du langage naturel. Pour promouvoir la recherche sur l'apprentissage tout au long de la vie, Sarath Chandar a créé la Conférence sur les agents d'apprentissage tout au long de la vie (CoLLAs) en 2022 et a présidé le programme en 2022 et en 2023. Il est titulaire d'un doctorat de l'Université de Montréal et d'une maîtrise en recherche de l'Indian Institute of Technology Madras.

Étudiants actuels

Maîtrise recherche - UdeM
Maîtrise recherche - Polytechnique
Doctorat - Polytechnique
Co-superviseur⋅e :
Collaborateur·rice de recherche
Maîtrise recherche - McGill
Maîtrise recherche - Polytechnique
Doctorat - Polytechnique
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Postdoctorat - Polytechnique
Collaborateur·rice alumni
Doctorat - Polytechnique
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Doctorat - Polytechnique
Collaborateur·rice de recherche - Polytechnique
Doctorat - UdeM
Doctorat - Polytechnique
Doctorat - UdeM
Collaborateur·rice de recherche - Polytechnique Montreal
Maîtrise recherche - Polytechnique
Collaborateur·rice alumni
Doctorat - Polytechnique
Maîtrise recherche - Polytechnique
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique
Postdoctorat - UdeM
Maîtrise recherche - UdeM
Doctorat - Polytechnique
Collaborateur·rice de recherche
Doctorat - Polytechnique
Doctorat - Polytechnique
Doctorat - Polytechnique

Publications

Loss Smoothing for Continual Adaptation
Neural networks are often adapted in nonstationary data distributions settings where the objective is to optimize performance on the current… (voir plus) task, and preserving accuracy on previous tasks is not required. As a result, existing methods primarily focus on improving plasticity, while stability is largely studied in the context of continual learning. In this work, we examine whether preserving stability can also be beneficial in model adaptation settings where past-task performance is irrelevant. We propose a simple loss smoothing approach that encourages selective adaptation by preserving task-shared features while modifying task-inconsistent ones. We evaluate our method on continual supervised model adaptation benchmarks and reinforcement learning benchmarks, and show that promoting representational stability during adaptation can improve performance across settings.
Monitoring morphometric drift in lifelong learning segmentation of the spinal cord.
Enamundram Naga Karthik
Christoph Stefan Aigner
Elise Bannier
Josef Bednařík
Virginie Callot
Anna Combes
Armin Curt
Gergely David
Falk Eippert
Lynn Farner
Michael G. Fehlings
Patrick Freund
Tobias Granberg
Cristina Granziera
Rhscir Network Imaging Group
Ulrike Horn
Tomáš Horák
Suzanne Humphreys … (voir 36 de plus)
Markus Hupp
Anne Kerbrat
Nawal Kinany
Shannon Kolind
Petr Kudlička
Anna Lebret
Lisa Eunyoung Lee
Cristina Granziera
Allan R. Martin
Govind Nair
Megan McGrath
Kristin P. O’Grady
Jiwon Oh
Russell Ouellette
Nikolai Pfender
Dario Pfyffer
Pierre‐François Pradat
Alexandre Prat
Alexandre Prat
Daniel S. Reich
Ilaria Ricchi
Naama Rotem‐Kohavi
Simon Schading-Sassenhausen
Maryam Seif
Andrew Smith
Seth A. Smith
Grace Sweeney
Roger Tam
Anthony Traboulsee
Constantina A. Treaba
Charidimos Tsagkas
Dimitri Van De Ville
Zachary Vavasour
Kenneth A. Weber
Morphometric measures derived from spinal cord segmentations can serve as diagnostic and prognostic biomarkers in neurological diseases and … (voir plus)injuries affecting the spinal cord. For instance, the spinal cord cross-sectional area can be used to monitor cord atrophy in multiple sclerosis and to characterize compression in degenerative cervical myelopathy. While robust, automatic segmentation methods to a wide variety of contrasts and pathologies have been developed over the past few years, whether their predictions are stable as the model is updated using new datasets has not been assessed. This is particularly important for deriving normative values from healthy participants. In this study, we present a spinal cord segmentation model trained on a multisite (n=75) dataset, including 9 different MRI contrasts and several spinal cord pathologies. We also introduce a lifelong learning framework to automatically monitor the morphometric drift as the model is updated using additional datasets. The framework is triggered by an automatic GitHub Actions workflow every time a new model is created, recording the morphometric values derived from the model's predictions over time. As a real-world application of the proposed framework, we employed the spinal cord segmentation model to update a recently-introduced normative database of healthy participants containing commonly used measures of spinal cord morphometry. Results showed that: (i) our model performs well compared to its previous versions and existing pathology-specific models on the lumbar spinal cord, images with severe compression, and in the presence of intramedullary lesions and/or atrophy achieving an average Dice score of 0.95 ± 0.03; (ii) the automatic workflow for monitoring morphometric drift provides a quick feedback loop for developing future segmentation models; and (iii) the scaling factor required to update the database of morphometric measures is nearly constant among slices across the given vertebral levels, showing minimum drift between the current and previous versions of the model monitored by the framework. The model is freely available in Spinal Cord Toolbox v7.0.
The Expressive Limits of Diagonal SSMs for State-Tracking
State-Space Models (SSMs) have recently been shown to achieve strong empirical performance on a variety of long-range sequence modeling task… (voir plus)s while remaining efficient and highly-parallelizable. However, the theoretical understanding of their expressive power remains limited. In this work, we study the expressivity of input-Dependent Complex-valued Diagonal (DCD) State-Space Models (SSMs) on sequential state-tracking tasks for abstract groups. It is easy to show that a single DCD SSM layer with a universal decoder can track any Abelian group at finite precision by decomposing it into a product of cyclic groups. We show that this is tight by proving that such a model cannot track any non-Abelian group at finite precision. We further establish the expressivity of multi-layer DCD SSMs. We show that a
The Markovian Thinker
Reasoning LLMs suffer from quadratic compute growth as their context length increases, making reinforcement learning with verifiable rewards… (voir plus) (RLVR) and test-time scaling prohibitively expensive. Prior work has tried to lighten the computational burden by shortening reasoning traces through pruning, summarization, or multi-stage training, but these methods remain bound to quadratic costs. We introduce Delethink, a thinking algorithm that realizes the Markovian Thinking Paradigm. Instead of producing one long monolithic reasoning trace, Delethink thinks in a sequence of chunks, the Delethink trace. Each chunk continues reasoning by referring only to a fixed number of prior tokens, which functions as a Markovian state sufficient for progressing reasoning, while deleting the rest. This preserves continuity without carrying the quadratic baggage. As a result, compute scales linearly and peak memory remains constant. In experiments, we show that Delethink can be applied directly to off-the-shelf reasoning models ranging from
Sub-Goal Distillation: A Method to Improve Small Language Agents
While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational req… (voir plus)uirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.
A Generalist Hanabi Agent
Traditional multi-agent reinforcement learning (MARL) systems can develop cooperative strategies through repeated interactions. However, the… (voir plus)se systems are unable to perform well on any other setting than the one they have been trained on, and struggle to successfully cooperate with unfamiliar collaborators. This is particularly visible in the Hanabi benchmark, a popular 2-to-5 player cooperative card-game which requires complex reasoning and precise assistance to other agents. Current MARL agents for Hanabi can only learn one specific game-setting (e.g., 2-player games), and play with the same algorithmic agents. This is in stark contrast to humans, who can quickly adjust their strategies to work with unfamiliar partners or situations. In this paper, we introduce Recurrent Replay Relevance Distributed DQN (R3D2), a generalist agent for Hanabi, designed to overcome these limitations. We reformulate the task using text, as language has been shown to improve transfer. We then propose a distributed MARL algorithm that copes with the resulting dynamic observation- and action-space. In doing so, our agent is the first that can play all game settings concurrently, and extend strategies learned from one setting to other ones. As a consequence, our agent also demonstrates the ability to collaborate with different algorithmic agents -- agents that are themselves unable to do so. The implementation code is available at:
The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
Reinforcement learning (RL) has recently become a strong recipe for training reasoning LLMs that produce long chains of thought (LongCoT). Y… (voir plus)et the standard RL"thinking environment", where the state is the prompt plus all prior reasoning tokens, makes the state unbounded and forces attention-based policies to pay quadratic compute as thoughts lengthen. We revisit the environment itself. We propose Markovian Thinking, a paradigm in which the policy advances reasoning while conditioning on a constant-size state, decoupling thinking length from context size. As an immediate consequence this yields linear compute with constant memory. We instantiate this idea with Delethink, an RL environment that structures reasoning into fixed-size chunks. Within each chunk, the model thinks as usual; at the boundary, the environment resets the context and reinitializes the prompt with a short carryover. Through RL, the policy learns to write a textual state near the end of each chunk sufficient for seamless continuation of reasoning after reset. Trained in this environment, an R1-Distill 1.5B model reasons in 8K-token chunks yet thinks up to 24K tokens, matching or surpassing LongCoT-RL trained with a 24K budget. With test-time scaling, Delethink continues to improve where LongCoT plateaus. The effect of linear compute is substantial: we empirically estimate at 96K average thinking length LongCoT-RL costs 27 H100-months vs. 7 for Delethink. Analysis at RL initialization shows off-the-shelf reasoning models (1.5B-120B) often sample Markovian traces zero-shot across diverse benchmarks, providing positive samples that make RL effective at scale. Our results show that redesigning the thinking environment is a powerful lever: it enables very long reasoning without quadratic overhead and opens a path toward efficient, scalable reasoning LLMs.
Torque-Aware Momentum
Efficiently exploring complex loss landscapes is key to the performance of deep neural networks. While momentum-based optimizers are widely … (voir plus)used in state-of-the-art setups, classical momentum can still struggle with large, misaligned gradients, leading to oscillations. To address this, we propose Torque-Aware Momentum (TAM), which introduces a damping factor based on the angle between the new gradients and previous momentum, stabilizing the update direction during training. Empirical results show that TAM, which can be combined with both SGD and Adam, enhances exploration, handles distribution shifts more effectively, and improves generalization performance across various tasks, including image classification and large language model fine-tuning, when compared to classical momentum-based optimizers.
Toward Debugging Deep Reinforcement Learning Programs with RLExplorer
Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However… (voir plus), like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.
Balancing Context Length and Mixing Times for Reinforcement Learning at Scale
Due to the recent remarkable advances in artificial intelligence, researchers have begun to consider challenging learning problems such as … (voir plus)learning to generalize behavior from large offline datasets or learning online in non-Markovian environments. Meanwhile, recent advances in both of these areas have increasingly relied on conditioning policies on large context lengths. A natural question is if there is a limit to the performance benefits of increasing the context length if the computation needed is available. In this work, we establish a novel theoretical result that links the context length of a policy to the time needed to reliably evaluate its performance (i.e., its mixing time) in large scale partially observable reinforcement learning environments that exhibit latent sub-task structure. This analysis underscores a key tradeoff: when we extend the context length, our policy can more effectively model non-Markovian dependencies, but this comes at the cost of potentially slower policy evaluation and as a result slower downstream learning. Moreover, our empirical results highlight the relevance of this analysis when leveraging Transformer based neural networks. This perspective will become increasingly pertinent as the field scales towards larger and more realistic environments, opening up a number of potential future directions for improving the way we design learning agents.
Protein Language Models: Is Scaling Necessary?
Robert M. Vernon
Benjamin Schulz
Christopher James Langmead
Public protein sequence databases contain samples from the fitness landscape explored by nature. Protein language models (pLMs) pre-trained … (voir plus)on these sequences aim to capture this landscape for tasks like property prediction and protein design. Following the same trend as in natural language processing, pLMs have continuously been scaled up. However, the premise that scale leads to better performance assumes that source databases provide an accurate representation of the underlying fitness landscape, which is likely false. By developing an efficient codebase, designing a modern architecture, and addressing data quality concerns such as sample bias, we introduce AMPLIFY, a best-in-class pLM that is orders of magnitude less expensive to train and deploy than previous models. Furthermore, to support the scientific community and democratize the training of pLMs, we have open-sourced AMPLIFY’s pre-training codebase, data, and model checkpoints.
Lookbehind-SAM: k Steps Back, 1 Step Forward
Sharpness-aware minimization (SAM) methods have gained increasing popularity by formulating the problem of minimizing both loss value and lo… (voir plus)ss sharpness as a minimax objective. In this work, we increase the efficiency of the maximization and minimization parts of SAM's objective to achieve a better loss-sharpness trade-off. By taking inspiration from the Lookahead optimizer, which uses multiple descent steps ahead, we propose Lookbehind, which performs multiple ascent steps behind to enhance the maximization step of SAM and find a worst-case perturbation with higher loss. Then, to mitigate the variance in the descent step arising from the gathered gradients across the multiple ascent steps, we employ linear interpolation to refine the minimization step. Lookbehind leads to a myriad of benefits across a variety of tasks. Particularly, we show increased generalization performance, greater robustness against noisy weights, as well as improved learning and less catastrophic forgetting in lifelong learning settings. Our code is available at https://github.com/chandar-lab/Lookbehind-SAM.