Sarath Chandar

Biographie

Sarath Chandar est professeur associé au départment de génie informatique et génie logiciel de Polytechnique Montréal, où il dirige le laboratoire de recherche Chandar. Il est également membre académique principal à Mila – Institut québécois d’intelligence artificielle, et titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada en apprentissage machine permanent.

Ses recherches portent sur l'apprentissage tout au long de la vie, l'apprentissage profond, l'optimisation, l'apprentissage par renforcement et le traitement du langage naturel. Pour promouvoir la recherche sur l'apprentissage tout au long de la vie, Sarath Chandar a créé la Conférence sur les agents d'apprentissage tout au long de la vie (CoLLAs) en 2022 et a présidé le programme en 2022 et en 2023. Il est titulaire d'un doctorat de l'Université de Montréal et d'une maîtrise en recherche de l'Indian Institute of Technology Madras.

Étudiants actuels

Ista Abbes

Maîtrise recherche - UdeM

Davide Baldelli

Doctorat - Polytechnique

Co-superviseur⋅e :

Maîtrise recherche - Polytechnique

Naga Karthik Enamundram

Doctorat - Polytechnique

Superviseur⋅e principal⋅e :

Julien Cohen-Adad

emvnagakarthik@gmail.com

Prashant Govindarajan

Doctorat - Polytechnique

Simon Guiroy

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

David Heurtel--Depeiges

Doctorat - Polytechnique

Amir Ardalan Kalantari Dehaghi

Jerry Huang

Doctorat - UdeM

Collaborateur·rice alumni

Lola Le Breton

Maîtrise recherche - Polytechnique

Postdoctorat - UdeM

Doctorat - Polytechnique

Roshan Balaji Munirathinam Sankaran Balaji

Mohamed Amine Merzouk

Postdoctorat - Polytechnique

Superviseur⋅e principal⋅e :

Stagiaire de recherche - Polytechnique

Hadi NekoeiQachkanloo

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Postdoctorat

Visiteur de recherche indépendant

Mohammad R. Samsami

Maîtrise recherche - UdeM

Maîtrise recherche - Polytechnique

Arjun Vaithilingam Sudhakar

Megh Thakkar

Maîtrise recherche - UdeM

Doctorat - Polytechnique

Kowen Woo

Stagiaire de recherche - Polytechnique

Abdelrahman Zayed

Doctorat - Polytechnique

Xutong Zhao

Doctorat - Polytechnique

Artem Zholus

Doctorat - Polytechnique

NeoBERT: une nouvelle frontière pour les modèles de langage encodeurs open-source

Billets de blogue

A digital picture of Bert from Sesame street, wering black trench coat and sunglasses

3 mars 2025

par

Lola Le Breton

Quentin Fournier

Sarath Chandar

Lire l'article

1 octobre 2024

Comment expliquer l’IA et s’assurer que cette explication est vraie? Les modèles mesurables de fidélité vous indiquent comment y parvenir

par

Andrea Madsen

Siva Reddy

Sarath Chandar

Lire l'article

Publications

Toward Debugging Deep Reinforcement Learning Programs with RLExplorer

Rached Bouchoucha

Ahmed Haj Yahmed

Darshan Patil

Janarthanan Rajendran

Amin Nikanjam

Foutse Khomh

Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However… (voir plus), like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.

2024-10-06

ArXiv (prépublication)

Toward Debugging Deep Reinforcement Learning Programs with RLExplorer

Rached Bouchoucha

Ahmed Haj Yahmed

Darshan Patil

Janarthanan Rajendran

Amin Nikanjam

Foutse Khomh

2024-10-06

2024 IEEE International Conference on Software Maintenance and Evolution (ICSME) (publié)

Balancing Context Length and Mixing Times for Reinforcement Learning at Scale

Matthew D Riemer

Khimya Khetarpal

Janarthanan Rajendran

Mila Janarthanan

É. Montréal

2024-09-25

NeurIPS.cc/2024/Conference (poster)

Protein Language Models: Is Scaling Necessary?

Quentin Fournier

Robert M. Vernon

Almer van der Sloot

Benjamin Schulz

Christopher James Langmead

2024-09-23

bioRxiv (prépublication)

Protein Language Models: Is Scaling Necessary?

Quentin Fournier

Robert M. Vernon

Almer van der Sloot

Benjamin Schulz

Christopher James Langmead

Public protein sequence databases contain samples from the fitness landscape explored by nature. Protein language models (pLMs) pre-trained … (voir plus)on these sequences aim to capture this landscape for tasks like property prediction and protein design. Following the same trend as in natural language processing, pLMs have continuously been scaled up. However, the premise that scale leads to better performance assumes that source databases provide accurate representation of the underlying fitness landscape, which is likely false. By developing an efficient codebase, designing a modern architecture, and addressing data quality concerns such as sample bias, we introduce AMPLIFY, a best-in-class pLM that is orders of magnitude less expensive to train and deploy than previous models. Furthermore, to support the scientific community and democratize the training of pLMs, we have open-sourced AMPLIFY’s pre-training codebase, data, and model checkpoints.

2024-09-23

bioRxiv (prépublication)

Are self-explanations from Large Language Models faithful?

Andreas Madsen

Siva Reddy

2024-08-01

Findings of the Association for Computational Linguistics ACL 2024 (publié)

Should We Attend More or Less? Modulating Attention for Fairness

Abdelrahman Zayed

Goncalo Mordido

Samira Shabanian

2024-07-10

colmweb.org/COLM/2024/Conference (accepté)

Lookbehind-SAM: k steps back, 1 step forward

Goncalo Mordido

Pranshu Malviya

Aristide Baratin

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

proceedings.mlr.press

A Reinforcement Learning Pipeline for Band Gap-directed Crystal Generation

Prashant Govindarajan

Mathieu Reymond

Santiago Miret

Antoine Clavaud

Mariano Phielipp

Property-driven AI-automated material discovery presents unique challenges owing to the complex nature of the chemical structural space and … (voir plus)computationally expensive simulations. For crystalline solids, the band gap is an important property for designing semiconductors and batteries. However, optimizing crystals for a target band gap is difficult and not well-explored. Reinforcement learning (RL) shows promise towards optimizing crystals, as it can freely explore the chemical space. However, it relies on regular band gap evaluations, which can only be accurately computed through expensive Density Functional Theory (DFT) simulations. In this study, we propose an active learning-inspired pipeline that combines RL and DFT simulations for optimizing crystal compositions given a target band gap. The pipeline includes an RL policy for predicting atom types and a band gap network that is fine-tuned with DFT data. Preliminary results indicate the need for furthering the state-of-the-art to address the inherent challenges of the problem.

2024-07-08

BOKU.ac.at/2024/AI4Mat (poster)

Language Model-In-The-Loop: Data Optimal Approach to Recommend Actions in Text Games

Arjun V Sudhakar

Prasanna Parthasarathi

Janarthanan Rajendran

Large Language Models (LLMs) have demonstrated superior performance in language understanding benchmarks. A recent use case for LLMs involve… (voir plus)s training decision-making agents over textual information. The existing approach leverages LLM's linguistic priors for action candidate recommendations in text games, i.e., to operate without environment-provided actions. However, adapting LLMs to specific games/tasks requires a massive amount of annotated human gameplay. Moreover, in the existing approach, the language model was kept frozen during an agent's training process, which limits learning from in-game knowledge about the world. Hence, we explore strategies to adapt the language model for candidate recommendation with in-game transition in an online learning fashion to mitigate reliance on human-annotated gameplays, which are costly to acquire. In this paper, we propose in-game transition selection methods to adapt the LLM in the loop, reducing the dependency on using human-annotated gameplays while improving performance and convergence. Our method demonstrates a 53% relative improvement in average game score over the previous state-of-the-art model, achieving more than twice the convergence rate in a full-annotated dataset setting. Furthermore, even with only 10% of human annotation, we surpassed the 100\% state-of-the-art performance benchmark.

2024-06-20

rl-conference.cc/RLC/2024/Workshop/TAFM (publié)

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Pranshu Malviya

Goncalo Mordido

Aristide Baratin

Reza Babanezhad Harikandeh

Jerry Huang

Simon Lacoste-Julien

Razvan Pascanu

Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of su… (voir plus)ch optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.

2024-06-09

TMLR (accepté)

Why Don't Prompt-Based Fairness Metrics Correlate?

Abdelrahman Zayed

Goncalo Mordido

Ioana Baldini

The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led… (voir plus) to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across metrics. We show a significant improvement in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98 across metrics for gender and religion biases, respectively. Our code is available at https://github.com/chandar-lab/CAIRO.

2024-06-09

ArXiv (prépublication)