Portrait of Sarath Chandar

Sarath Chandar

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Indian Institute of Technology Madras
Research Topics
AI Alignment
Deep Learning
Explainable AI (XAI)
Foundation Models
Interpretability
Large Language Models (LLM)
Lifelong Learning
Medical Machine Learning
Multi-Agent Systems
Natural Language Processing
Online Learning
Optimization
Recurrent Neural Networks
Reinforcement Learning
Representation Learning
Transfer Learning
Trustworthy AI

Biography

Sarath Chandar is an associate professor at Polytechnique Montreal's Department of Computer and Software Engineering, where he leads the Chandar Research Lab. He is also a Core Academic Member at Mila – Quebec Artificial Intelligence Institute and holds a Canada CIFAR AI Chair and the Canada Research Chair in Lifelong Machine Learning.

Chandar’s research interests include lifelong learning, deep learning, optimization, reinforcement learning and natural language processing. To promote research in lifelong learning, Chandar created the Conference on Lifelong Learning Agents (CoLLAs) in 2022, for which he served as program chair in 2022 and 2023.

He has a PhD from Université de Montréal and an MSc (By Research) from the Indian Institute of Technology Madras.

Current Students

Master's Research - Université de Montréal
PhD - Polytechnique Montréal
Co-supervisor :
Master's Research - Polytechnique Montréal
PhD - Polytechnique Montréal
Principal supervisor :
PhD - Polytechnique Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Polytechnique Montréal
PhD - Université de Montréal
Master's Research - Polytechnique Montréal
Postdoctorate - Université de Montréal
PhD - Polytechnique Montréal
Co-supervisor :
PhD - Polytechnique Montréal
Master's Research - Polytechnique Montréal
Postdoctorate - Polytechnique Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
Co-supervisor :
Independent visiting researcher
Master's Research - Université de Montréal
Master's Research - Polytechnique Montréal
Master's Research - Université de Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal

Publications

Protein Language Models: Is Scaling Necessary?
Quentin Fournier
Robert M. Vernon
Almer van der Sloot
Benjamin Schulz
Christopher James Langmead
Are self-explanations from Large Language Models faithful?
Andreas Madsen
Should We Attend More or Less? Modulating Attention for Fairness
Abdelrahman Zayed
Goncalo Mordido
Samira Shabanian
Lookbehind-SAM: k steps back, 1 step forward
Goncalo Mordido
Pranshu Malviya
Aristide Baratin
A Reinforcement Learning Pipeline for Band Gap-directed Crystal Generation
Prashant Govindarajan
Mathieu Reymond
Santiago Miret
Antoine Clavaud
Mariano Phielipp
Property-driven AI-automated material discovery presents unique challenges owing to the complex nature of the chemical structural space and … (see more)computationally expensive simulations. For crystalline solids, the band gap is an important property for designing semiconductors and batteries. However, optimizing crystals for a target band gap is difficult and not well-explored. Reinforcement learning (RL) shows promise towards optimizing crystals, as it can freely explore the chemical space. However, it relies on regular band gap evaluations, which can only be accurately computed through expensive Density Functional Theory (DFT) simulations. In this study, we propose an active learning-inspired pipeline that combines RL and DFT simulations for optimizing crystal compositions given a target band gap. The pipeline includes an RL policy for predicting atom types and a band gap network that is fine-tuned with DFT data. Preliminary results indicate the need for furthering the state-of-the-art to address the inherent challenges of the problem.
Language Model-In-The-Loop: Data Optimal Approach to Recommend Actions in Text Games
Arjun V Sudhakar
Prasanna Parthasarathi
Janarthanan Rajendran
Large Language Models (LLMs) have demonstrated superior performance in language understanding benchmarks. A recent use case for LLMs involve… (see more)s training decision-making agents over textual information. The existing approach leverages LLM's linguistic priors for action candidate recommendations in text games, i.e., to operate without environment-provided actions. However, adapting LLMs to specific games/tasks requires a massive amount of annotated human gameplay. Moreover, in the existing approach, the language model was kept frozen during an agent's training process, which limits learning from in-game knowledge about the world. Hence, we explore strategies to adapt the language model for candidate recommendation with in-game transition in an online learning fashion to mitigate reliance on human-annotated gameplays, which are costly to acquire. In this paper, we propose in-game transition selection methods to adapt the LLM in the loop, reducing the dependency on using human-annotated gameplays while improving performance and convergence. Our method demonstrates a 53% relative improvement in average game score over the previous state-of-the-art model, achieving more than twice the convergence rate in a full-annotated dataset setting. Furthermore, even with only 10% of human annotation, we surpassed the 100\% state-of-the-art performance benchmark.
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Pranshu Malviya
Goncalo Mordido
Aristide Baratin
Reza Babanezhad Harikandeh
Jerry Huang
Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of su… (see more)ch optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.
Why Don't Prompt-Based Fairness Metrics Correlate?
Abdelrahman Zayed
Goncalo Mordido
Ioana Baldini
The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led… (see more) to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across metrics. We show a significant improvement in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98 across metrics for gender and religion biases, respectively. Our code is available at https://github.com/chandar-lab/CAIRO.
Why Don't Prompt-Based Fairness Metrics Correlate?
Abdelrahman Zayed
Goncalo Mordido
Ioana Baldini
The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led… (see more) to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across metrics. We show a significant improvement in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98 across metrics for gender and religion biases, respectively. Our code is available at https://github.com/chandar-lab/CAIRO.
A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques
Megh Thakkar
Quentin Fournier
Matthew D Riemer
Pin-Yu Chen
Payel Das
Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-… (see more)training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment.
A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques
Megh Thakkar
Quentin Fournier
Matthew D Riemer
Pin-Yu Chen
Payel Das
Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-… (see more)training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment.
A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques
Megh Thakkar
Quentin Fournier
Matthew D Riemer
Pin-Yu Chen
Payel Das