Portrait of Sarath Chandar Anbil Parthipan

Sarath Chandar Anbil Parthipan

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Indian Institute of Technology Madras

Biography

Sarath Chandar is an assistant professor at Polytechnique Montréal, where he leads the Chandar Research Lab. He is also a core academic member of Mila – Quebec Artificial Intelligence Institute and holds a Canada CIFAR AI Chair and the Canada Research Chair in Lifelong Machine Learning.

Chandar’s research interests include lifelong learning, deep learning, optimization, reinforcement learning and natural language processing. To promote research in lifelong learning, Chandar created the Conference on Lifelong Learning Agents (CoLLAs) in 2022, for which he served as program chair in 2022 and 2023.

He has a PhD from Université de Montréal and an MSc (By Research) from the Indian Institute of Technology Madras.

Current Students

PhD - Polytechnique Montréal
Master's Research - Université de Montréal
PhD - Polytechnique Montréal
Co-supervisor :
Master's Research - Polytechnique Montréal
PhD - Université de Montréal
Postdoctorate - Polytechnique Montréal
PhD - Université de Montréal
Master's Research - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Master's Research - Université de Montréal
Collaborating Alumni - Université de Montréal
Principal supervisor :
Master's Research - Polytechnique Montréal
PhD - Polytechnique Montréal
Principal supervisor :
PhD - Polytechnique Montréal
Co-supervisor :
Master's Research - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Polytechnique Montréal

Publications

Towards Practical Tool Usage for Continually Learning LLMs
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Large language models (LLMs) show an innate skill for solving language based tasks. But insights have suggested an inability to adjust for i… (see more)nformation or task-solving skills becoming outdated, as their knowledge, stored directly within their parameters, remains static in time. Tool use helps by offloading work to systems that the LLM can access through an interface, but LLMs that use them still must adapt to nonstationary environments for prolonged use, as new tools can emerge and existing tools can change. Nevertheless, tools require less specialized knowledge, therefore we hypothesize they are better suited for continual learning (CL) as they rely less on parametric memory for solving tasks and instead focus on learning when to apply pre-defined tools. To verify this, we develop a synthetic benchmark and follow this by aggregating existing NLP tasks to form a more realistic testing scenario. While we demonstrate scaling model size is not a solution, regardless of tool usage, continual learning techniques can enable tool LLMs to both adapt faster while forgetting less, highlighting their potential as continual learners.
Intelligent Switching for Reset-Free RL
Darshan Patil
Janarthanan Rajendran
Mastering Memory Tasks with World Models
Mohammad Reza Samsami
Artem Zholus
Janarthanan Rajendran
Current model-based reinforcement learning (MBRL) agents struggle with long-term dependencies. This limits their ability to effectively solv… (see more)e tasks involving extended time gaps between actions and outcomes, or tasks demanding the recalling of distant observations to inform current actions. To improve temporal coherence, we integrate a new family of state space models (SSMs) in world models of MBRL agents to present a new method, Recall to Imagine (R2I). This integration aims to enhance both long-term memory and long-horizon credit assignment. Through a diverse set of illustrative tasks, we systematically demonstrate that R2I not only establishes a new state-of-the-art for challenging memory and credit assignment RL tasks, such as BSuite and POPGym, but also showcases superhuman performance in the complex memory domain of Memory Maze. At the same time, it upholds comparable performance in classic RL tasks, such as Atari and DMC, suggesting the generality of our method. We also show that R2I is faster than the state-of-the-art MBRL method, DreamerV3, resulting in faster wall-time convergence.
Are self-explanations from Large Language Models faithful?
Learning Conditional Policies for Crystal Design Using Offline Reinforcement Learning
Prashant Govindarajan
Santiago Miret
Jarrid Rector-Brooks
Mariano Phielipp
Janarthanan Rajendran
Navigating through the exponentially large chemical space to search for desirable materials is an extremely challenging task in material dis… (see more)covery. Recent developments in generative and geometric deep learning have shown...
Fairness-Aware Structured Pruning in Transformers
A. Zayed
Goncalo Mordido
Samira Shabanian
Ioana Baldini
Measuring the Knowledge Acquisition-Utilization Gap in Pretrained Language Models
Amirhossein Kazemnejad
Mehdi Rezagholizadeh
Prasanna Parthasarathi
Language Model-In-The-Loop: Data Optimal Approach to Learn-To-Recommend Actions in Text Games
Arjun Vaithilingam Sudhakar
Prasanna Parthasarathi
Janarthanan Rajendran
Faithfulness Measurable Masked Language Models
EpiK-Eval: Evaluation for Language Models as Epistemic Models
Gabriele Prato
Jerry Huang
Prasanna Parthasarathi
Shagun Sodhani
In the age of artificial intelligence, the role of large language models (LLMs) is becoming increasingly central. Despite their growing prev… (see more)alence, their capacity to consolidate knowledge from different training documents—a crucial ability in numerous applications—remains unexplored. This paper presents the first study examining the capability of LLMs to effectively combine such information within their parameter space. We introduce EpiK-Eval, a novel question-answering benchmark tailored to evaluate LLMs' proficiency in formulating a coherent and consistent knowledge representation from segmented narratives. Evaluations across various LLMs reveal significant weaknesses in this domain. We contend that these shortcomings stem from the intrinsic nature of prevailing training objectives. Consequently, we advocate for refining the approach towards knowledge consolidation, as it harbors the potential to dramatically improve their overall effectiveness and performance. The findings from this study offer insights for developing more robust and reliable LLMs. Our code and benchmark are available at https://github.com/chandar-lab/EpiK-Eval
Self-Influence Guided Data Reweighting for Language Model Pre-training
Megh Thakkar
Tolga Bolukbasi
Sriram Ganapathy
Shikhar Vashishth
Partha Talukdar
Language Models (LMs) pre-trained with selfsupervision on large text corpora have become the default starting point for developing models fo… (see more)r various NLP tasks. Once the pre-training corpus has been assembled, all data samples in the corpus are treated with equal importance during LM pre-training. However, due to varying levels of relevance and quality of data, equal importance to all the data samples may not be the optimal choice. While data reweighting has been explored in the context of task-specific supervised learning and LM fine-tuning, model-driven reweighting for pretraining data has not been explored. We fill this important gap and propose PRESENCE, a method for jointly reweighting samples by leveraging self-influence (SI) scores as an indicator of sample importance and pre-training. PRESENCE promotes novelty and stability for model pre-training. Through extensive analysis spanning multiple model sizes, datasets, and tasks, we present PRESENCE as an important first step in the research direction of sample reweighting for pre-training language models.
Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi
Hadi Nekoei
Xutong Zhao
Janarthanan Rajendran
Miao Liu
Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with Zero-Shot Coordination (ZSC) have gained significant attention in rece… (see more)nt years. ZSC refers to the ability of agents to coordinate zero-shot (without additional interaction experience) with independently trained agents. While ZSC is crucial for cooperative MARL agents, it might not be possible for complex tasks and changing environments. Agents also need to adapt and improve their performance with minimal interaction with other agents. In this work, we show empirically that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods, and they require millions of interaction samples to adapt to these new partners. To investigate this issue, we formally defined a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods. In particular, we created a diverse set of pre-trained agents and defined a new metric called adaptation regret that measures the agent's ability to efficiently adapt and improve its coordination performance when paired with some held-out pool of partners on top of its ZSC performance. After evaluating several SOTA algorithms using our framework, our experiments reveal that naive Independent Q-Learning (IQL) agents in most cases adapt as quickly as the SOTA ZSC algorithm Off-Belief Learning (OBL). This finding raises an interesting research question: How to design MARL algorithms with high ZSC performance and capability of fast adaptation to unseen partners. As a first step, we studied the role of different hyper-parameters and design choices on the adaptability of current MARL algorithms. Our experiments show that two categories of hyper-parameters controlling the training data diversity and optimization process have a significant impact on the adaptability of Hanabi agents.