Sarath Chandar

Biography

Sarath Chandar is an associate professor at Polytechnique Montreal's Department of Computer and Software Engineering, where he leads the Chandar Research Lab. He is also a Core Academic Member at Mila – Quebec Artificial Intelligence Institute and holds a Canada CIFAR AI Chair and the Canada Research Chair in Lifelong Machine Learning.

Chandar’s research interests include lifelong learning, deep learning, optimization, reinforcement learning and natural language processing. To promote research in lifelong learning, Chandar created the Conference on Lifelong Learning Agents (CoLLAs) in 2022, for which he served as program chair in 2022 and 2023.

He has a PhD from Université de Montréal and an MSc (By Research) from the Indian Institute of Technology Madras.

Current Students

Ista Abbes

Master's Research - Université de Montréal

Alex Aselstyne

Master's Research - Polytechnique Montréal

Davide Baldelli

PhD - Polytechnique Montréal

Co-supervisor :

Milan Bhan

Collaborating researcher

Diego Cerda Mardini

Master's Research - McGill University

Antoine Clavaud

Master's Research - Polytechnique Montréal

Naga Karthik Enamundram

PhD - Polytechnique Montréal

Principal supervisor :

Prashant Govindarajan

PhD - Polytechnique Montréal

Simon Guiroy

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher

Principal supervisor :

Irina Rish

Maryam Hashemzadeh

PhD - Université de Montréal

David Heurtel--Depeiges

PhD - Polytechnique Montréal

Jerry Huang

PhD - Université de Montréal

Saurav Jha

Postdoctorate - Polytechnique Montréal

Amir Kalantari Dehaghi

Collaborating Alumni

Lola Le Breton

PhD - Polytechnique Montréal

Aidan Li

Master's Research - Université de Montréal

Co-supervisor :

Postdoctorate - Université de Montréal

PhD - Polytechnique Montréal

Roshan Munirathinam Sankaran Balaji

Collaborating researcher - Polytechnique Montréal

Hadi NekoeiQachkanloo

PhD - Université de Montréal

Nilaksh Nilaksh

PhD - Polytechnique Montréal

PhD - Université de Montréal

Linda Peinthiere

Collaborating researcher - Polytechnique Montréal Montreal

Yann Pernot

Master's Research - Polytechnique Montréal

Gabriele Prato

PhD - Université de Montréal

Postdoctorate

Shaipranesh Senthilkumar

PhD - Polytechnique Montréal

Arjun Vaithilingam Sudhakar

Nour Shaheen

Master's Research - Polytechnique Montréal

Principal supervisor :

PhD - Polytechnique Montréal

Anabel Tan

Postdoctorate - Université de Montréal

Megh Thakkar

Master's Research - Université de Montréal

PhD - Polytechnique Montréal

Shawn Whitfield

Collaborating researcher

Abdelrahman Zayed

PhD - Polytechnique Montréal

Xutong Zhao

PhD - Polytechnique Montréal

Artem Zholus

PhD - Polytechnique Montréal

Improving CAD Design With LLMs

Blog Posts

December 19, 2025

Prashant Govindarajan

Davide Baldelli

Quentin Fournier

Sarath Chandar

Read the article

A digital picture of Bert from Sesame street, wering black trench coat and sunglasses

March 3, 2025

NeoBERT: A New Frontier for Open-Source Encoder Language Models

Lola Le Breton

Quentin Fournier

Sarath Chandar

Read the article

October 1, 2024

How Do We Explain AI and Ensure the Explanation Is True? Faithfulness Measurable Models Tell You How

Andrea Madsen

Siva Reddy

Sarath Chandar

Read the article

Publications

Should We Attend More or Less? Modulating Attention for Fairness

A. Zayed

Goncalo Mordido

Samira Shabanian

2023-05-22

ArXiv (preprint)

Conditionally optimistic exploration for cooperative deep multi-agent reinforcement learning

Xutong Zhao

Yangchen Pan

Chenjun Xiao

Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration met… (see more)hod that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent’s optimism estimate captures a structured dependency relationship with other agents. Assuming agents compute actions following a sequential order at each environment timestep, we provide a perspective to view MARL as tree search iterations by considering agents as nodes at different depths of the search tree. Inspired by the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees), we develop a method called Conditionally Optimistic Exploration (COE). COE augments each agent’s state-action value estimate with an action-conditioned optimistic bonus derived from the visitation count of the global state and joint actions of preceding agents. COE is performed during training and disabled at deployment, making it compatible with any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.

2023-05-08

auai.org/UAI/2023/Conference (published)

Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

Xutong Zhao

Yangchen Pan

Chenjun Xiao

Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration met… (see more)hod that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. Assuming agents compute actions following a sequential order at \textit{each environment timestep}, we provide a perspective to view MARL as tree search iterations by considering agents as nodes at different depths of the search tree. Inspired by the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees), we develop a method called Conditionally Optimistic Exploration (COE). COE augments each agent's state-action value estimate with an action-conditioned optimistic bonus derived from the visitation count of the global state and joint actions of preceding agents. COE is performed during training and disabled at deployment, making it compatible with any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.

2023-05-08

auai.org/UAI/2023/Conference (published)

Behavioral Cloning for Crystal Design

Prashant Govindarajan

Santiago Miret

Jarrid Rector-Brooks

Mariano Phielipp

Solid-state materials, which are made up of periodic 3D crystal structures, are particularly useful for a variety of real-world applications… (see more) such as batteries, fuel cells and catalytic materials. Designing solid-state materials, especially in a robust and automated fashion, remains an ongoing challenge. To further the automated design of crystalline materials, we propose a method to learn to design valid crystal structures given a crystal skeleton. By incorporating Euclidean equivariance into a policy network, we portray the problem of designing new crystals as a sequential prediction task suited for imitation learning. At each step, given an incomplete graph of a crystal skeleton, an agent assigns an element to a specific node. We adopt a behavioral cloning strategy to train the policy network on data consisting of curated trajectories generated from known crystals.

2023-03-17

ICLR.cc/2023/Workshop/ML4Materials (poster)

Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Hadi Nekoei

Akilesh Badrinaaraayanan

Amit Sinha

Mohammad Amin Amini

Aditya Mahajan

2023-02-06

ArXiv (preprint)

An Empirical Investigation of the Role of Pre-training in Lifelong Learning

Sanket Vaibhav Mehta

Darshan Patil

Emma Strubell

The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due … (see more)to its resemblance to biological learning, but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel dataset of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness in order to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach leads to performance comparable to the state-of-the-art in task-sequential continual learning across multiple settings, without retaining a memory that scales in size with the number of tasks.

Replay Buffer with Local Forgetting for Adapting to Local Environment Changes in Deep Model-Based Reinforcement Learning

Ali Rahimi-Kalahroudi

Ida Momennejad

Harm van Seijen

2023-01-01

CoLLAs (published)

proceedings.mlr.press

Self-Influence Guided Data Reweighting for Language Model Pre-training

Megh Thakkar

Tolga Bolukbasi

Sriram Ganapathy

Shikhar Vashishth

Partha Talukdar

Language Models (LMs) pre-trained with selfsupervision on large text corpora have become the default starting point for developing models fo… (see more)r various NLP tasks. Once the pre-training corpus has been assembled, all data samples in the corpus are treated with equal importance during LM pre-training. However, due to varying levels of relevance and quality of data, equal importance to all the data samples may not be the optimal choice. While data reweighting has been explored in the context of task-specific supervised learning and LM fine-tuning, model-driven reweighting for pretraining data has not been explored. We fill this important gap and propose PRESENCE, a method for jointly reweighting samples by leveraging self-influence (SI) scores as an indicator of sample importance and pre-training. PRESENCE promotes novelty and stability for model pre-training. Through extensive analysis spanning multiple model sizes, datasets, and tasks, we present PRESENCE as an important first step in the research direction of sample reweighting for pre-training language models.

2023-01-01

EMNLP (published)

Post-hoc Interpretability for Neural NLP: A Survey

Andreas Madsen

Siva Reddy

2022-12-23

ACM Computing Surveys (published)

Replay Buffer With Local Forgetting for Adaptive Deep Model-Based Reinforcement Learning

Ali Rahimi-Kalahroudi

Ida Momennejad

Harm van Seijen

One of the key behavioral characteristics used in neuroscience to determine whether the subject of study—be it a rodent or a human—exhib… (see more)its model-based learning is effective adaptation to local changes in the environment. In reinforcement learning, however, recent work has shown that modern deep model-based reinforcement-learning (MBRL) methods adapt poorly to such changes. An explanation for this mismatch is that MBRL methods are typically designed with sample-efﬁciency on a single task in mind and the requirements for effective adaptation are substantially higher, both in terms of the learned world model and the planning routine. One particularly challenging requirement is that the learned world model has to be sufﬁciently accurate throughout relevant parts of the state-space. This is challenging for deep-learning-based world models due to catastrophic forgetting. And while a replay buffer can mitigate the effects of catastrophic forgetting, the traditional ﬁrst-in-ﬁrst-out replay buffer precludes effective adaptation due to maintaining stale data. In this work

2022-12-09

NeurIPS.cc/2022/Workshop/DeepRL (unknown)

Improving Meta-Learning Generalization with Activation-Based Early-Stopping

2022-11-28

Proceedings of The 1st Conference on Lifelong Learning Agents (published)