Sarath Chandar

Biography

Sarath Chandar is an associate professor at Polytechnique Montreal's Department of Computer and Software Engineering, where he leads the Chandar Research Lab. He is also a Core Academic Member at Mila – Quebec Artificial Intelligence Institute and holds a Canada CIFAR AI Chair and the Canada Research Chair in Lifelong Machine Learning.

Chandar’s research interests include lifelong learning, deep learning, optimization, reinforcement learning and natural language processing. To promote research in lifelong learning, Chandar created the Conference on Lifelong Learning Agents (CoLLAs) in 2022, for which he served as program chair in 2022 and 2023.

He has a PhD from Université de Montréal and an MSc (By Research) from the Indian Institute of Technology Madras.

Current Students

Ista Abbes

Master's Research - Université de Montréal

Davide Baldelli

PhD - Polytechnique Montréal

Co-supervisor :

Master's Research - Polytechnique Montréal

Naga Karthik Enamundram

PhD - Polytechnique Montréal

Principal supervisor :

Julien Cohen-Adad

emvnagakarthik@gmail.com

Prashant Govindarajan

PhD - Polytechnique Montréal

Simon Guiroy

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Principal supervisor :

Liam Paull

Maryam Hashemzadeh

PhD - Université de Montréal

David Heurtel--Depeiges

PhD - Polytechnique Montréal

Amir Ardalan Kalantari Dehaghi

Jerry Huang

PhD - Université de Montréal

Collaborating Alumni

Lola Le Breton

Master's Research - Polytechnique Montréal

Ekaterina Lobacheva

Postdoctorate - Université de Montréal

PhD - Polytechnique Montréal

Roshan Balaji Munirathinam Sankaran Balaji

Mohamed Amine Merzouk

Postdoctorate - Polytechnique Montréal

Principal supervisor :

Research Intern - Polytechnique Montréal

Hadi NekoeiQachkanloo

PhD - Université de Montréal

Darshan Patil

PhD - Université de Montréal

Gabriele Prato

PhD - Université de Montréal

Postdoctorate

Independent visiting researcher

Mohammad R. Samsami

Master's Research - Université de Montréal

Master's Research - Polytechnique Montréal

Arjun Vaithilingam Sudhakar

Megh Thakkar

Master's Research - Université de Montréal

PhD - Polytechnique Montréal

Kowen Woo

Research Intern - Polytechnique Montréal

Abdelrahman Zayed

PhD - Polytechnique Montréal

Xutong Zhao

PhD - Polytechnique Montréal

Artem Zholus

PhD - Polytechnique Montréal

NeoBERT: A New Frontier for Open-Source Encoder Language Models

Blog Posts

A digital picture of Bert from Sesame street, wering black trench coat and sunglasses

March 3, 2025

Lola Le Breton

Quentin Fournier

Sarath Chandar

Read the article

October 1, 2024

How Do We Explain AI and Ensure the Explanation Is True? Faithfulness Measurable Models Tell You How

Andrea Madsen

Siva Reddy

Sarath Chandar

Read the article

Publications

Torque-Aware Momentum

Pranshu Malviya

Goncalo Mordido

Aristide Baratin

Reza Babanezhad Harikandeh

Gintare Karolina Dziugaite

Razvan Pascanu

2024-12-25

ArXiv (preprint)

Gintare Karolina Dziugaite

Torque-Aware Momentum

Pranshu Malviya

Goncalo Mordido

Aristide Baratin

Reza Babanezhad Harikandeh

Razvan Pascanu

Efficiently exploring complex loss landscapes is key to the performance of deep neural networks. While momentum-based optimizers are widely … (see more)used in state-of-the-art setups, classical momentum can still struggle with large, misaligned gradients, leading to oscillations. To address this, we propose Torque-Aware Momentum (TAM), which introduces a damping factor based on the angle between the new gradients and previous momentum, stabilizing the update direction during training. Empirical results show that TAM, which can be combined with both SGD and Adam, enhances exploration, handles distribution shifts more effectively, and improves generalization performance across various tasks, including image classification and large language model fine-tuning, when compared to classical momentum-based optimizers.

2024-12-25

ArXiv (preprint)

Gintare Karolina Dziugaite

Torque-Aware Momentum

Pranshu Malviya

Goncalo Mordido

Aristide Baratin

Reza Babanezhad Harikandeh

Razvan Pascanu

2024-12-25

ArXiv (preprint)

Too Big to Fool: Resisting Deception in Language Models

Mohammad Reza Samsami

M. L. Richter

Juan Rodriguez

Megh Thakkar

Large language models must balance their weight-encoded knowledge with in-context information from prompts to generate accurate responses. T… (see more)his paper investigates this interplay by analyzing how models of varying capacities within the same family handle intentionally misleading in-context information. Our experiments demonstrate that larger models exhibit higher resilience to deceptive prompts, showcasing an advanced ability to interpret and integrate prompt information with their internal knowledge. Furthermore, we find that larger models outperform smaller ones in following legitimate instructions, indicating that their resilience is not due to disregarding in-context information. We also show that this phenomenon is likely not a result of memorization but stems from the models' ability to better leverage implicit task-relevant information from the prompt alongside their internally stored knowledge.

2024-12-13

ArXiv (preprint)

Too Big to Fool: Resisting Deception in Language Models

Mohammad Reza Samsami

Mats Leon Richter

Juan A. Rodriguez

Megh Thakkar

2024-12-13

ArXiv (preprint)

openreview.net

Too Big to Fool: Resisting Deception in Language Models

Mohammad Reza Samsami

M. L. Richter

Juan Rodriguez

Megh Thakkar

2024-12-13

ArXiv (preprint)

Too Big to Fool: Resisting Deception in Language Models

Mohammad Reza Samsami

M. L. Richter

Juan Rodriguez

Megh Thakkar

2024-12-13

ArXiv (preprint)

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Megh Thakkar

Yash More

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Amal Zouaq

Payel Das

There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruc… (see more)tion-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.

2024-11-11

ArXiv (preprint)

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Megh Thakkar

Yash More

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Amal Zouaq

Payel Das

2024-11-11

ArXiv (preprint)

Crystal Design Amidst Noisy DFT Signals: A Reinforcement Learning Approach

Prashant Govindarajan

Mathieu Reymond

Santiago Miret

Mariano Phielipp

2024-11-03

NeurIPS.cc/2024/Workshop/AI4Mat (published)

openreview.net

Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination

Jerry Huang

Prasanna Parthasarathi

Mehdi Rezagholizadeh

Boxing Chen

The growth in prominence of large language models (LLMs) in everyday life can be largely attributed to their generative abilities, yet some … (see more)of this is also owed to the risks and costs associated with their use. On one front is their tendency to \textit{hallucinate} false or misleading information, limiting their reliability. On another is the increasing focus on the computational limitations associated with traditional self-attention based LLMs, which has brought about new alternatives, in particular recurrent models, meant to overcome them. Yet it remains uncommon to consider these two concerns simultaneously. Do changes in architecture exacerbate/alleviate existing concerns about hallucinations? Do they affect how and where they occur? Through an extensive evaluation, we study how these architecture-based inductive biases affect the propensity to hallucinate. While hallucination remains a general phenomenon not limited to specific architectures, the situations in which they occur and the ease with which specific types of hallucinations can be induced can significantly differ based on the model architecture. These findings highlight the need for better understanding both these problems in conjunction with each other, as well as consider how to design more universal techniques for handling hallucinations.

2024-10-22

ArXiv (preprint)

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Megh Thakkar

Yash More

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Amal Zouaq

Payel Das

2024-10-10

NeurIPS.cc/2024/Workshop/AFM (poster)