Portrait de Sarath Chandar

Sarath Chandar

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur associé, Polytechnique Montréal, Département d'informatique et de génie logiciel
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Indian Institute of Technology Madras
Sujets de recherche
Alignement de l'IA
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage en ligne
Apprentissage par renforcement
Apprentissage par transfert
Apprentissage profond
Apprentissage tout au long de la vie
Grands modèles de langage (LLM)
IA digne de confiance
Interprétabilité
Modèles de fondation
Optimisation
Réseaux de neurones récurrents
Systèmes multi-agents
Traitement du langage naturel
XAI (IA explicable)

Biographie

Sarath Chandar est professeur associé au départment de génie informatique et génie logiciel de Polytechnique Montréal, où il dirige le laboratoire de recherche Chandar. Il est également membre académique principal à Mila – Institut québécois d’intelligence artificielle, et titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada en apprentissage machine permanent.

Ses recherches portent sur l'apprentissage tout au long de la vie, l'apprentissage profond, l'optimisation, l'apprentissage par renforcement et le traitement du langage naturel. Pour promouvoir la recherche sur l'apprentissage tout au long de la vie, Sarath Chandar a créé la Conférence sur les agents d'apprentissage tout au long de la vie (CoLLAs) en 2022 et a présidé le programme en 2022 et en 2023. Il est titulaire d'un doctorat de l'Université de Montréal et d'une maîtrise en recherche de l'Indian Institute of Technology Madras.

Étudiants actuels

Doctorat - UdeM
Maîtrise recherche - Polytechnique
Doctorat - Polytechnique
Co-superviseur⋅e :
Collaborateur·rice de recherche
Maîtrise recherche - McGill
Doctorat - Polytechnique
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - Polytechnique
Doctorat - Polytechnique
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Doctorat - Polytechnique
Collaborateur·rice de recherche - Polytechnique
Doctorat - UdeM
Doctorat - Polytechnique
Doctorat - UdeM
Collaborateur·rice de recherche - Polytechnique Montreal
Maîtrise recherche - Polytechnique
Collaborateur·rice alumni
Doctorat - Polytechnique
Maîtrise recherche - Polytechnique
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique
Postdoctorat - UdeM
Maîtrise recherche - UdeM
Doctorat - Polytechnique
Doctorat - Polytechnique
Doctorat - Polytechnique
Doctorat - Polytechnique

Publications

NeuroFaith: Evaluating Mechanistic Faithfulness of LLM Free Text Self-Explanation at the Concept Level
Jean-Noël Vittaut
Nicolas Chesneau
Marie-Jeanne Lesot
Large Language Models (LLMs) can generate plausible free text self-explanations to justify their answers. However, these natural language ex… (voir plus)planations may not accurately reflect the model's actual reasoning process, indicating a lack of faithfulness. Existing faithfulness evaluation methods rely primarily on behavioral tests or computational block analysis without examining the semantic content of internal neural representations. This paper proposes NeuroFaith, a flexible framework that measures the faithfulness of LLM free text self-explanation by identifying key concepts within explanations and mechanistically testing whether these concepts actually influence the model's predictions. We show the versatility of NeuroFaith across 2-hop reasoning and classification tasks. Additionally, we develop a linear faithfulness probe based on NeuroFaith to detect unfaithful self-explanations from representation space and improve faithfulness through steering. NeuroFaith provides a principled approach to evaluating and enhancing the faithfulness of LLM free text self-explanations, addressing critical needs for trustworthy AI systems.
Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)
When post-trained language models fail on reasoning problems, the common test-time-scaling response is to spend more compute on additional a… (voir plus)ttempts, and the failed traces play no further role. We argue this discards a crucial signal; some failures come from unlucky sampling, where more rollouts help, while others are structural and resist resampling regardless of budget. We propose that failed traces encode recoverability structure: the inference-time signature of which test-time interventions can rescue a given failure. Three problem-level trajectory features, derived from the structure of available interventions, recover this structure from the distributional signature of failed rollouts, not their text. They cluster failures into stable regimes, characterize the failure topography of different post-training methods (
CrysTune: Crystal Generation via Fine-Tuning of Large Language Models on Wyckoff Representations
The discovery of novel materials is essential for driving scientific and technological breakthroughs. Recent work has explored fine-tuning l… (voir plus)arge language models (LLMs) for autoregressive crystal generation, but the ideal representation and training strategies for symmetry-based inductive biases remain unclear. We propose CrysTune, a class of LLMs fine-tuned on Wyckoff representations of crystals with two auxiliary tasks: canonicalization and template prediction. CrysTune shows competitive performance and improved stability-related metrics relative to LLMs trained on standard string-encoded representations. We further use these models as initial policies for reinforcement learning (RL) fine-tuning to optimize stability, validity, uniqueness, novelty, and diversity. RL-trained policies produce more valid and metastable crystals, while introducing novelty and diversity trade-offs. We also explore crystal system conditioning, showing that RL-trained policies produce a higher proportion of crystals matching the target condition.
Probabilistic Calibration Is a Trainable Capability in Language Models
Language models are increasingly used in settings where outputs must satisfy user-specified randomness constraints, yet their generation pro… (voir plus)babilities are often poorly calibrated to those targets. We study whether this capability can be improved directly through fine-tuning. Concretely, we fine-tune language models on synthetic prompts that require sampling from mathematical distributions, and compare two Calibration Fine-Tuning variants: a soft-target method that converts the desired output distribution into trie-derived next-token targets, and a hard-target method that trains on sampled completions from the same target distribution. Across 12 models spanning four families, both methods substantially improve structured-sampling fidelity on held-out distribution families and unseen parameter settings, showing that probabilistic calibration is a trainable capability. Under our selected training configurations, the two methods exhibit different empirical profiles: hard-target fine-tuning is often strongest on structured numeric sampling, while soft-target fine-tuning performs better on broader stochastic generation benchmarks, including open-ended random generation, multiple-choice answer-position balancing, and NoveltyBench. The gains sometimes reduce downstream capability, especially arithmetic reasoning, with costs varying by model. Overall, our results show that probabilistic calibration can be improved through fine-tuning, with our hard-target configuration favoring exact numeric fidelity and our soft-target configuration favoring broader stochastic transfer. Code is available at https://github.com/chandar-lab/calibration-finetuning.
TAPNext++: What's Next for Tracking Any Point (TAP)?
Sebastian Jung
Martin Sundermeyer
Carl Doersch
David Joseph Tan
Rudolph Triebel
Federico Tombari
Tracking-Any-Point (TAP) models aim to track any point through a video which is a crucial task in AR/XR and robotics applications. The recen… (voir plus)tly introduced TAPNext approach proposes an end-to-end, recurrent transformer architecture to track points frame-by-frame in a purely online fashion -- demonstrating competitive performance at minimal latency. However, we show that TAPNext struggles with longer video sequences and also frequently fails to re-detect query points that reappear after being occluded or leaving the frame. In this work, we present TAPNext++, a model that tracks points in sequences that are orders of magnitude longer while preserving the low memory and compute footprint of the architecture. We train the recurrent video transformer using several data-driven solutions, including training on long 1024-frame sequences enabled by sequence parallelism techniques. We highlight that re-detection performance is a blind spot in the current literature and introduce a new metric, Re-Detection Average Jaccard (
Emergent Reasoning via Recursive Latent Reinforcement Pretraining
Large language models (LLMs) often rely on explicit chain-of-thought (CoT) traces to solve multi-step reasoning problems, but these traces i… (voir plus)ncrease inference cost, expose brittle prompt dependence, and complicate training objectives. We study an alternative: \emph{latent deliberation} implemented as a small recurrent refinement module that performs multiple internal ``thinking`` steps while keeping the external sequence length fixed. We introduce \textbf{Recursive Latent Reinforcement Pretraining (RLRP)}, a training recipe that augments a base causal LLM with a shared latent head executed for
Is Depth Heterogeneity a Barrier to Model Merging?
Model merging offers a way to combine the capabilities of several networks at test time without retraining or additional finetuning, but mos… (voir plus)t merging methods assume identical architectures. Depth differences are commonly viewed as a major obstacle because they remove clear layer correspondences. We test this assumption by merging residual networks that differ only in depth, using a simple training-free pipeline based on identity expansion and permutation alignment. Across both same-task and multitask image classification experiments, heterogeneous merges closely match homogeneous ones. The results suggest that, for residual networks, depth mismatch is not the main barrier to effective model merging, and that the main difficulty in model merging comes from aligning independently trained weights in a homogeneous setting.
Loss Smoothing for Continual Adaptation
Neural networks are often adapted in nonstationary data distributions settings where the objective is to optimize performance on the current… (voir plus) task, and preserving accuracy on previous tasks is not required. As a result, existing methods primarily focus on improving plasticity, while stability is largely studied in the context of continual learning. In this work, we examine whether preserving stability can also be beneficial in model adaptation settings where past-task performance is irrelevant. We propose a simple loss smoothing approach that encourages selective adaptation by preserving task-shared features while modifying task-inconsistent ones. We evaluate our method on continual supervised model adaptation benchmarks and reinforcement learning benchmarks, and show that promoting representational stability during adaptation can improve performance across settings.
CoPeP: Benchmarking Continual Pretraining for Protein Language Models
Protein language models (pLMs) have recently gained significant attention for their ability to uncover relationships between sequence, struc… (voir plus)ture, and function from evolutionary statistics, thereby accelerating therapeutic drug discovery. These models learn from large protein databases that are continuously updated by the biology community and whose dynamic nature motivates the application of continual learning, not only to keep up with the ever-growing data, but also as an opportunity to take advantage of the temporal meta-information that is created during this process. As a result, we introduce the Continual Pretraining of Protein Language Models (CoPeP) benchmark, a novel benchmark for evaluating continual learning approaches on pLMs. Specifically, we curate a sequence of protein datasets derived from the UniProt Knowledgebase spanning a decade and define metrics to assess pLM performance across 31 protein understanding tasks. We evaluate several methods from the continual learning literature, including replay, unlearning, and plasticity-based methods, some of which have never been applied to models and data of this scale. Our findings reveal that incorporating temporal meta-information improves perplexity by up to 7% even when compared to training on data from all tasks jointly. Moreover, even at scale, several continual learning methods outperform naive continual pretraining. The CoPeP benchmark offers an exciting opportunity to study these methods at scale in an impactful real-world application.
Monitoring morphometric drift in lifelong learning segmentation of the spinal cord.
Enamundram Naga Karthik
Christoph S. Aigner
Élise Bannier
Josef Bednařík
Virginie Callot
Anna Combes
Armin Curt
Gergely David
Falk Eippert
Lynn Farner
Michael G Fehlings
Patrick Freund
Tobias Granberg
Cristina Granziera
Rhscir Network Imaging Group
Ulrike Horn
Tomáš Horák
Suzanne Humphreys … (voir 36 de plus)
Markus Hupp
Anne Kerbrat
Nawal Kinany
Shannon Kolind
Petr Kudlička
Anna Lebret
Lisa Eunyoung Lee
Caterina Mainero
Allan R. Martin
Megan McGrath
Govind Nair
Kristin P. O'Grady
Jiwon Oh
Russell Ouellette
Nikolai Pfender
Dario Pfyffer
Pierre-François Pradat
Alexandre Prat
Emanuele Pravatà
Daniel S. Reich
Ilaria Ricchi
Naama Rotem-Kohavi
Simon Schading-Sassenhausen
Maryam Seif
Andrew Smith
Seth A Smith
Grace Sweeney
Roger Tam
Anthony Traboulsee
Constantina Andrada Treaba
Charidimos Tsagkas
Zachary Vavasour
Dimitri Van De Ville
Kenneth Arnold Weber II
Morphometric measures derived from spinal cord segmentations can serve as diagnostic and prognostic biomarkers in neurological diseases and … (voir plus)injuries affecting the spinal cord. For instance, the spinal cord cross-sectional area can be used to monitor cord atrophy in multiple sclerosis and to characterize compression in degenerative cervical myelopathy. While robust, automatic segmentation methods to a wide variety of contrasts and pathologies have been developed over the past few years, whether their predictions are stable as the model is updated using new datasets has not been assessed. This is particularly important for deriving normative values from healthy participants. In this study, we present a spinal cord segmentation model trained on a multisite (n=75) dataset, including 9 different MRI contrasts and several spinal cord pathologies. We also introduce a lifelong learning framework to automatically monitor the morphometric drift as the model is updated using additional datasets. The framework is triggered by an automatic GitHub Actions workflow every time a new model is created, recording the morphometric values derived from the model's predictions over time. As a real-world application of the proposed framework, we employed the spinal cord segmentation model to update a recently-introduced normative database of healthy participants containing commonly used measures of spinal cord morphometry. Results showed that: (i) our model performs well compared to its previous versions and existing pathology-specific models on the lumbar spinal cord, images with severe compression, and in the presence of intramedullary lesions and/or atrophy achieving an average Dice score of 0.95 ± 0.03; (ii) the automatic workflow for monitoring morphometric drift provides a quick feedback loop for developing future segmentation models; and (iii) the scaling factor required to update the database of morphometric measures is nearly constant among slices across the given vertebral levels, showing minimum drift between the current and previous versions of the model monitored by the framework. The model is freely available in Spinal Cord Toolbox v7.0.
Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning
In streaming Reinforcement Learning (RL), transitions are observed and discarded immediately after a single update. While this minimizes res… (voir plus)ource usage for on-device applications, it makes agents notoriously sample-inefficient, since value-based losses alone struggle to extract meaningful representations from transient data. We propose extending Self-Predictive Representations (SPR) to the streaming pipeline to maximize the utility of every observed frame. However, due to the highly correlated samples induced by the streaming regime, naively applying this auxiliary loss results in training instabilities. Thus, we introduce orthogonal gradient updates relative to the momentum target and resolve gradient conflicts arising from streaming-specific optimizers. Validated across the Atari, MinAtar, and Octax suites, our approach systematically outperforms existing streaming baselines. Latent-space analysis, including t-SNE visualizations and effective-rank measurements, confirms that our method learns significantly richer representations, bridging the performance gap caused by the absence of a replay buffer, while remaining efficient enough to train on just a few CPU cores.
The Expressive Limits of Diagonal SSMs for State-Tracking
State-Space Models (SSMs) have recently been shown to achieve strong empirical performance on a variety of long-range sequence modeling task… (voir plus)s while remaining efficient and highly-parallelizable. However, the theoretical understanding of their expressive power remains limited. In this work, we study the expressivity of input-Dependent Complex-valued Diagonal (DCD) State-Space Models (SSMs) on sequential state-tracking tasks for abstract groups. It is easy to show that a single DCD SSM layer with a universal decoder can track any Abelian group at finite precision by decomposing it into a product of cyclic groups. We show that this is tight by proving that such a model cannot track any non-Abelian group at finite precision. We further establish the expressivity of multi-layer DCD SSMs. We show that a