Sarath Chandar

Biography

Sarath Chandar is an associate professor at Polytechnique Montreal's Department of Computer and Software Engineering, where he leads the Chandar Research Lab. He is also a Core Academic Member at Mila – Quebec Artificial Intelligence Institute and holds a Canada CIFAR AI Chair and the Canada Research Chair in Lifelong Machine Learning.

Chandar’s research interests include lifelong learning, deep learning, optimization, reinforcement learning and natural language processing. To promote research in lifelong learning, Chandar created the Conference on Lifelong Learning Agents (CoLLAs) in 2022, for which he served as program chair in 2022 and 2023.

He has a PhD from Université de Montréal and an MSc (By Research) from the Indian Institute of Technology Madras.

Current Students

Ista Abbes

Master's Research - Université de Montréal

Alex Aselstyne

Master's Research - Polytechnique Montréal

Davide Baldelli

PhD - Polytechnique Montréal

Co-supervisor :

Milan Bhan

Collaborating researcher

Diego Cerda Mardini

Master's Research - McGill University

Antoine Clavaud

Master's Research - Polytechnique Montréal

Naga Karthik Enamundram

PhD - Polytechnique Montréal

Principal supervisor :

Prashant Govindarajan

PhD - Polytechnique Montréal

Simon Guiroy

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

David Heurtel--Depeiges

PhD - Polytechnique Montréal

Jerry Huang

PhD - Université de Montréal

Saurav Jha

Postdoctorate - Polytechnique Montréal

Amir Kalantari Dehaghi

Collaborating Alumni

Lola Le Breton

PhD - Polytechnique Montréal

Aidan Li

Master's Research - Université de Montréal

Co-supervisor :

Postdoctorate - Université de Montréal

PhD - Polytechnique Montréal

Roshan Munirathinam Sankaran Balaji

Research Intern - Polytechnique Montréal

Rayen Nacef

Collaborating researcher - Polytechnique Montréal

Hadi NekoeiQachkanloo

PhD - Université de Montréal

Nilaksh Nilaksh

PhD - Polytechnique Montréal

PhD - Université de Montréal

Linda Peinthiere

Collaborating researcher - Polytechnique Montréal Montreal

Gabriele Prato

PhD - Université de Montréal

Postdoctorate

Shaipranesh Senthilkumar

PhD - Polytechnique Montréal

Arjun Vaithilingam Sudhakar

Nour Shaheen

Master's Research - Polytechnique Montréal

Principal supervisor :

PhD - Polytechnique Montréal

Megh Thakkar

Master's Research - Université de Montréal

PhD - Polytechnique Montréal

Shawn Whitfield

Collaborating researcher

Kowen Woo

Research Intern - Polytechnique Montréal

Anabel XL

Postdoctorate - Université de Montréal

PhD - Polytechnique Montréal

Xutong Zhao

PhD - Polytechnique Montréal

Artem Zholus

PhD - Polytechnique Montréal

NeoBERT: A New Frontier for Open-Source Encoder Language Models

Blog Posts

A digital picture of Bert from Sesame street, wering black trench coat and sunglasses

March 3, 2025

Lola Le Breton

Quentin Fournier

Sarath Chandar

Read the article

October 1, 2024

How Do We Explain AI and Ensure the Explanation Is True? Faithfulness Measurable Models Tell You How

Andrea Madsen

Siva Reddy

Sarath Chandar

Read the article

Publications

Balancing Context Length and Mixing Times for Reinforcement Learning at Scale

Matthew D Riemer

Khimya Khetarpal

Janarthanan Rajendran

Mila Janarthanan

É. Montréal

2024-09-25

NeurIPS.cc/2024/Conference (poster)

Protein Language Models: Is Scaling Necessary?

Quentin Fournier

Robert M. Vernon

Almer Van Der Sloot

Benjamin Schulz

Christopher James Langmead

Public protein sequence databases contain samples from the fitness landscape explored by nature. Protein language models (pLMs) pre-trained … (see more)on these sequences aim to capture this landscape for tasks like property prediction and protein design. Following the same trend as in natural language processing, pLMs have continuously been scaled up. However, the premise that scale leads to better performance assumes that source databases provide accurate representation of the underlying fitness landscape, which is likely false. By developing an efficient codebase, designing a modern architecture, and addressing data quality concerns such as sample bias, we introduce AMPLIFY, a best-in-class pLM that is orders of magnitude less expensive to train and deploy than previous models. Furthermore, to support the scientific community and democratize the training of pLMs, we have open-sourced AMPLIFY’s pre-training codebase, data, and model checkpoints.

2024-09-23

bioRxiv (preprint)

Protein Language Models: Is Scaling Necessary?

Quentin Fournier

Robert M. Vernon

Almer Van Der Sloot

Benjamin Schulz

Christopher James Langmead

2024-09-23

bioRxiv (preprint)

Are self-explanations from Large Language Models faithful?

Andreas Madsen

Siva Reddy

2024-08-01

Findings of the Association for Computational Linguistics ACL 2024 (published)

Should We Attend More or Less? Modulating Attention for Fairness

Samira Shabanian

2024-07-10

colmweb.org/COLM/2024/Conference (accepted)

Lookbehind-SAM: k steps back, 1 step forward

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

A Reinforcement Learning Pipeline for Band Gap-directed Crystal Generation

Prashant Govindarajan

Mathieu Reymond

Santiago Miret

Antoine Clavaud

Mariano Phielipp

Property-driven AI-automated material discovery presents unique challenges owing to the complex nature of the chemical structural space and … (see more)computationally expensive simulations. For crystalline solids, the band gap is an important property for designing semiconductors and batteries. However, optimizing crystals for a target band gap is difficult and not well-explored. Reinforcement learning (RL) shows promise towards optimizing crystals, as it can freely explore the chemical space. However, it relies on regular band gap evaluations, which can only be accurately computed through expensive Density Functional Theory (DFT) simulations. In this study, we propose an active learning-inspired pipeline that combines RL and DFT simulations for optimizing crystal compositions given a target band gap. The pipeline includes an RL policy for predicting atom types and a band gap network that is fine-tuned with DFT data. Preliminary results indicate the need for furthering the state-of-the-art to address the inherent challenges of the problem.

2024-07-08

BOKU.ac.at/2024/AI4Mat (poster)

Language Model-In-The-Loop: Data Optimal Approach to Recommend Actions in Text Games

Arjun V Sudhakar

Prasanna Parthasarathi

Janarthanan Rajendran

Large Language Models (LLMs) have demonstrated superior performance in language understanding benchmarks. A recent use case for LLMs involve… (see more)s training decision-making agents over textual information. The existing approach leverages LLM's linguistic priors for action candidate recommendations in text games, i.e., to operate without environment-provided actions. However, adapting LLMs to specific games/tasks requires a massive amount of annotated human gameplay. Moreover, in the existing approach, the language model was kept frozen during an agent's training process, which limits learning from in-game knowledge about the world. Hence, we explore strategies to adapt the language model for candidate recommendation with in-game transition in an online learning fashion to mitigate reliance on human-annotated gameplays, which are costly to acquire. In this paper, we propose in-game transition selection methods to adapt the LLM in the loop, reducing the dependency on using human-annotated gameplays while improving performance and convergence. Our method demonstrates a 53% relative improvement in average game score over the previous state-of-the-art model, achieving more than twice the convergence rate in a full-annotated dataset setting. Furthermore, even with only 10% of human annotation, we surpassed the 100\% state-of-the-art performance benchmark.

2024-06-20

rl-conference.cc/RLC/2024/Workshop/TAFM (published)

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Pranshu Malviya

Reza Babanezhad Harikandeh

Aristide Baratin

Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of su… (see more)ch optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.

2024-06-09

TMLR (accepted)

Why Don't Prompt-Based Fairness Metrics Correlate?

Ioana Baldini

The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led… (see more) to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across metrics. We show a significant improvement in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98 across metrics for gender and religion biases, respectively. Our code is available at https://github.com/chandar-lab/CAIRO.

2024-06-09

ArXiv (preprint)

Why Don't Prompt-Based Fairness Metrics Correlate?

Ioana Baldini

2024-06-09

ArXiv (preprint)

A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Megh Thakkar

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Amal Zouaq

Payel Das

Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-… (see more)training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment.

2024-06-07

ArXiv (preprint)