Portrait de Sarath Chandar

Sarath Chandar

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur associé, Polytechnique Montréal, Département d'informatique et de génie logiciel
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Indian Institute of Technology Madras
Sujets de recherche
Alignement de l'IA
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage en ligne
Apprentissage par renforcement
Apprentissage par transfert
Apprentissage profond
Apprentissage tout au long de la vie
Grands modèles de langage (LLM)
IA digne de confiance
Interprétabilité
Modèles de fondation
Optimisation
Réseaux de neurones récurrents
Systèmes multi-agents
Traitement du langage naturel
XAI (IA explicable)

Biographie

Sarath Chandar est professeur associé au départment de génie informatique et génie logiciel de Polytechnique Montréal, où il dirige le laboratoire de recherche Chandar. Il est également membre académique principal à Mila – Institut québécois d’intelligence artificielle, et titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada en apprentissage machine permanent.

Ses recherches portent sur l'apprentissage tout au long de la vie, l'apprentissage profond, l'optimisation, l'apprentissage par renforcement et le traitement du langage naturel. Pour promouvoir la recherche sur l'apprentissage tout au long de la vie, Sarath Chandar a créé la Conférence sur les agents d'apprentissage tout au long de la vie (CoLLAs) en 2022 et a présidé le programme en 2022 et en 2023. Il est titulaire d'un doctorat de l'Université de Montréal et d'une maîtrise en recherche de l'Indian Institute of Technology Madras.

Étudiants actuels

Maîtrise recherche - UdeM
Maîtrise recherche - Polytechnique
Doctorat - Polytechnique
Co-superviseur⋅e :
Stagiaire de recherche - Polytechnique
Collaborateur·rice de recherche
Maîtrise recherche - McGill
Maîtrise recherche - Polytechnique
Doctorat - Polytechnique
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - Polytechnique
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Doctorat - Polytechnique
Postdoctorat - Polytechnique
Superviseur⋅e principal⋅e :
Stagiaire de recherche - Polytechnique
Stagiaire de recherche - Polytechnique
Doctorat - UdeM
Doctorat - Polytechnique
Doctorat - UdeM
Visiteur de recherche indépendant
Maîtrise recherche - UdeM
Maîtrise recherche - Polytechnique
Maîtrise recherche - UdeM
Doctorat - Polytechnique
Collaborateur·rice de recherche
Stagiaire de recherche - Polytechnique
Doctorat - Polytechnique
Doctorat - Polytechnique
Doctorat - Polytechnique

Publications

Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs
Behnoush Khavari
Jayesh Khullar
Franccois Rivest
Recent work has shown that LRNN models such as S4D, Mamba, and DeltaNet lack state-tracking capability due to either time-invariant transiti… (voir plus)on matrices or restricted eigenvalue ranges. To address this, input-dependent transition matrices, particularly those that are complex or non-triangular, have been proposed to enhance SSM performance on such tasks. While existing theorems demonstrate that both input-independent and non-negative SSMs are incapable of solving simple state-tracking tasks, such as parity, regardless of depth, they do not explore whether combining these two types in a multilayer SSM could help. We investigate this question for efficient SSMs with diagonal transition matrices and show that such combinations still fail to solve parity. This implies that a recurrence layer must both be input-dependent and include negative eigenvalues. Our experiments support this conclusion by analyzing an SSM model that combines S4D and Mamba layers.
Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models
Istabrak Abbes
Matthew D Riemer
Tsuguchika Tabaru
Hiroaki Kingetsu
Training large language models (LLMs) typically involves pre-training on massive corpora, only to restart the process entirely when new data… (voir plus) becomes available. A more efficient and resource-conserving approach would be continual pre-training, where models are updated with new data rather than retraining from scratch. However, the introduction of new data often causes distribution shifts, leading to performance degradation on previously learned tasks. In this paper, we take a deeper look at two popular proposals for addressing this distribution shift within the continual learning literature: experience replay and gradient alignment. We consider continual pre-training of models within the Llama family of architectures at a large scale across languages with 100 billion tokens of training data in each language, finding that both replay and gradient alignment lead to more stable learning without forgetting. This conclusion holds both as we vary the model scale and as we vary the number and diversity of tasks. Moreover, we are the first to demonstrate the effectiveness of gradient alignment techniques in the context of LLM pre-training and propose an efficient implementation of meta-experience replay (MER) that imbues experience replay with the benefits of gradient alignment despite negligible compute and memory overhead. Our scaling analysis across model sizes and replay rates indicates that small rates of replaying old examples are definitely a more valuable use of compute than investing in model size, but that it is more compute efficient to scale the size of the model than invest in high rates of replaying old examples.
Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models
Istabrak Abbes
Matthew D Riemer
Tsuguchika Tabaru
Hiroaki Kingetsu
Boosting LLM Reasoning via Spontaneous Self-Correction
Tengyu Xu
Xuewei Wang
Zhengxing Chen
Di Jin
Liang Tan
Yen-Ting Lin
Zishun Yu
Zhuokai Zhao
Si-Yuan Wang
Yun He
Sinong Wang
Han Fang
MetaAI
Chen Zhu
Mila - Québec
AI Institute
Polytechnique Montréal
While large language models (LLMs) have demonstrated remarkable success on a broad range of tasks, math reasoning remains a challenging one.… (voir plus) One of the approaches for improving math reasoning is self-correction, which designs self-improving loops to let the model correct its own mistakes. However, existing self-correction approaches treat corrections as standalone post-generation refinements, relying on extra prompt and system designs to elicit self-corrections, instead of performing real-time, spontaneous self-corrections in a single pass. To address this, we propose SPOC, a spontaneous self-correction approach that enables LLMs to generate interleaved solutions and verifications in a single inference pass, with generation dynamically terminated based on verification outcomes, thereby effectively scaling inference time compute. SPOC considers a multi-agent perspective by assigning dual roles -- solution proposer and verifier -- to the same model. We adopt a simple yet effective approach to generate synthetic data for fine-tuning, enabling the model to develop capabilities for self-verification and multi-agent collaboration. We further improve its solution proposal and verification accuracy through online reinforcement learning. Experiments on mathematical reasoning benchmarks show that SPOC significantly improves performance. Notably, SPOC boosts the accuracy of Llama-3.1-8B and 70B Instruct models, achieving gains of 8.8% and 11.6% on MATH500, 10.0% and 20.0% on AMC23, and 3.3% and 6.7% on AIME24, respectively.
Steering Large Language Model Activations in Sparse Spaces
CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design
Computer-aided design (CAD) is the digital construction of 2D and 3D objects, and is central to a wide range of engineering and manufacturin… (voir plus)g applications like automobile and aviation. Despite its importance, CAD modeling remains largely a time-intensive, manual task. Recent works have attempted to automate this process with small transformer-based models and handcrafted CAD sequence representations. However, there has been little effort to leverage the potential of large language models (LLMs) for sequential CAD design. In this work, we introduce a new large-scale dataset of more than 170k CAD models annotated with high-quality, human-like descriptions generated with our pipeline based on GPT-4.1. Using this dataset, we fine-tune powerful code-LLMs to generate CAD sequences represented in a JSON-based format from natural language descriptions, demonstrating the viability and effectiveness of this approach for text-conditioned CAD generation. Because simple metrics often fail to reflect the quality of generated objects, we introduce geometric and topological metrics based on sphericity, mean curvature, and Euler characteristic to provide richer structural insights. Our experiments and ablation studies on both synthetic and human-annotated data demonstrate that CADmium is able to automate CAD design, drastically speeding up the design of new objects. The dataset, code, and fine-tuned models are available online.
Optimizers Qualitatively Alter Solutions And We Should Leverage This
Clare Lyle
Ionut-Vlad Modoranu
Naima Elosegui Borras
Dan Alistarh
Petar Veličković
Soham De
James Martens
Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when us… (voir plus)ing optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely fruitful, another perspective specific to DNNs has received considerably less attention: the optimizer not only influences the rate of convergence, but also the qualitative properties of the learned solutions. Restated, the optimizer can and will encode inductive biases and change the effective expressivity of a given class of models. Furthermore, we believe the optimizer can be an effective way of encoding desiderata in the learning process. We contend that the community should aim at understanding the biases of already existing methods, as well as aim to build new optimizers with the explicit intent of inducing certain properties of the solution, rather than solely judging them based on their convergence rates. We hope our arguments will inspire research to improve our understanding of how the learning process can impact the type of solution we converge to, and lead to a greater recognition of optimizers design as a critical lever that complements the roles of architecture and data in shaping model outcomes.
Small Encoders Can Rival Large Decoders in Detecting Groundedness
Istabrak Abbes
Fernando Rodriguez
Alaa Boukhary
Adam Elwood
Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) … (voir plus)tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less
Revisiting Laplacian Representations for Value Function Approximation in Deep RL
Proto-value functions (PVFs) introduced Laplacian embeddings as an effective feature basis for value-function approximation; however, their … (voir plus)utility remained limited to small, fully known state spaces. Recent work has scaled Laplacian embeddings to high-dimensional inputs, using them for reward shaping and option discovery in goal-directed tasks, yet only as auxiliary signals, rather than directly using them as features for value functions. In this paper, we learn Laplacian eigenvectors online and employ them as features for Q-learning in 23 Atari games. We empirically demonstrate that these online–learned embeddings substantially improve model-free RL in large, high-dimensional domains. We demonstrate that enriching state representations with action embeddings yields additional gains under both behavior-policy and uniform-random policies. Additionally, we introduce the Fusion architecture, which augments the representation with useful inductive bias at the embedding level. To assess the usefulness of each embedding used in the Fusion architecture, we use Shapley values analysis.
Monitoring morphometric drift in lifelong learning segmentation of the spinal cord
Enamundram Naga Karthik
Sandrine Bédard
Christoph S. Aigner
Elise Bannier
Josef Bednařík
Virginie Callot
Anna Combes
Armin Curt
Gergely David
Falk Eippert
Lynn Farner
Michael G Fehlings
Patrick Freund
Tobias Granberg
Cristina Granziera
Rhscir Network Imaging Group
Ulrike Horn
Tomáš Horák … (voir 36 de plus)
Suzanne Humphreys
Markus Hupp
Anne Kerbrat
Nawal Kinany
Shannon Kolind
Anna Lebret
Petr Kudlička
Lisa Eunyoung Lee
Caterina Mainero
Allan R. Martin
Megan McGrath
Govind Nair
Kristin P. O’Grady
Jiwon Oh
Russell Ouellette
Nikolai Pfender
Dario Pfyffer
P. Pradat
Alexandre Prat
Emanuele Pravatà
Daniel S. Reich
Ilaria Ricchi
Naama Rotem-Kohavi
Simon Schading-Sassenhausen
Maryam Seif
Andrew C. Smith
Seth Aaron Smith
Grace Sweeney
Roger Tam
Anthony Traboulsee
Constantina A. Treaba
Charidimos Tsagkas
Zachary Vavasour
Dimitri Van De Ville
Kenneth A. Weber
NovoMolGen: Rethinking Molecular Language Model Pretraining
Kamran Chitsaz
Roshan Balaji
Nirav Pravinbhai Bhatt
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Mahmoud Assran
Adrien Bardes
David Fan
Quentin Garrido
Russell Howes
Mojtaba Komeili
Matthew J. Muckley
Ammar Rizvi
Claire Roberts
Koustuv Sinha
Sergio Arnaud
Abha Gejji
Ada Martin
Francois Robert Hogan
Daniel Dugas
Piotr Bojanowski
Vasil Khalidov
Patrick Labatut
Francisco Massa … (voir 13 de plus)
Marc Szafraniec
K. Krishnakumar
Ying Li
Xiaodong Ma
Franziska Meier
Yann LeCun
Nicolas Ballas
Fair at Meta
Mila - Québec
AI Institute
Polytechnique Montréal
A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supe… (voir plus)rvised approach that combines internet-scale video data with a small amount of interaction data (robot trajectories), to develop models capable of understanding, predicting, and planning in the physical world. We first pre-train an action-free joint-embedding-predictive architecture, V-JEPA 2, on a video and image dataset comprising over 1 million hours of internet video. V-JEPA 2 achieves strong performance on motion understanding (77.3 top-1 accuracy on Something-Something v2) and state-of-the-art performance on human action anticipation (39.7 recall-at-5 on Epic-Kitchens-100) surpassing previous task-specific models. Additionally, after aligning V-JEPA 2 with a large language model, we demonstrate state-of-the-art performance on multiple video question-answering tasks at the 8 billion parameter scale (e.g., 84.0 on PerceptionTest, 76.9 on TempCompass). Finally, we show how self-supervised learning can be applied to robotic planning tasks by post-training a latent action-conditioned world model, V-JEPA 2-AC, using less than 62 hours of unlabeled robot videos from the Droid dataset. We deploy V-JEPA 2-AC zero-shot on Franka arms in two different labs and enable picking and placing of objects using planning with image goals. Notably, this is achieved without collecting any data from the robots in these environments, and without any task-specific training or reward. This work demonstrates how self-supervised learning from web-scale data and a small amount of robot interaction data can yield a world model capable of planning in the physical world.