Sarath Chandar

Biographie

Sarath Chandar est professeur associé au départment de génie informatique et génie logiciel de Polytechnique Montréal, où il dirige le laboratoire de recherche Chandar. Il est également membre académique principal à Mila – Institut québécois d’intelligence artificielle, et titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada en apprentissage machine permanent.

Ses recherches portent sur l'apprentissage tout au long de la vie, l'apprentissage profond, l'optimisation, l'apprentissage par renforcement et le traitement du langage naturel. Pour promouvoir la recherche sur l'apprentissage tout au long de la vie, Sarath Chandar a créé la Conférence sur les agents d'apprentissage tout au long de la vie (CoLLAs) en 2022 et a présidé le programme en 2022 et en 2023. Il est titulaire d'un doctorat de l'Université de Montréal et d'une maîtrise en recherche de l'Indian Institute of Technology Madras.

Étudiants actuels

Ista Abbes

Maîtrise recherche - UdeM

Alex Aselstyne

Stagiaire de recherche - Polytechnique

Davide Baldelli

Doctorat - Polytechnique

Co-superviseur⋅e :

joe Ben

Stagiaire de recherche - Polytechnique

joumenbensaid@gmail.com

Milan Bhan

Collaborateur·rice de recherche

Antoine Clavaud

Maîtrise recherche - Polytechnique

Naga Karthik Enamundram

Doctorat - Polytechnique

Superviseur⋅e principal⋅e :

Julien Cohen-Adad

emvnagakarthik@gmail.com

Prashant Govindarajan

Doctorat - Polytechnique

Simon Guiroy

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

David Heurtel--Depeiges

Doctorat - Polytechnique

Amir Ardalan Kalantari Dehaghi

Jerry Huang

Doctorat - UdeM

Collaborateur·rice alumni

Lola Le Breton

Maîtrise recherche - Polytechnique

Postdoctorat - UdeM

Doctorat - Polytechnique

Roshan Munirathinam Sankaran Balaji

Mohamed Amine Merzouk

Postdoctorat - Polytechnique

Superviseur⋅e principal⋅e :

Stagiaire de recherche - Polytechnique

Hadi NekoeiQachkanloo

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Postdoctorat

Visiteur de recherche indépendant

Mohammad R. Samsami

Maîtrise recherche - UdeM

Maîtrise recherche - Polytechnique

Arjun Vaithilingam Sudhakar

Megh Thakkar

Maîtrise recherche - UdeM

Doctorat - Polytechnique

Kowen Woo

Stagiaire de recherche - Polytechnique

Abdelrahman Zayed

Doctorat - Polytechnique

Xutong Zhao

Doctorat - Polytechnique

Artem Zholus

Doctorat - Polytechnique

NeoBERT: une nouvelle frontière pour les modèles de langage encodeurs open-source

Billets de blogue

A digital picture of Bert from Sesame street, wering black trench coat and sunglasses

3 mars 2025

par

Lola Le Breton

Quentin Fournier

Sarath Chandar

Lire l'article

1 octobre 2024

Comment expliquer l’IA et s’assurer que cette explication est vraie? Les modèles mesurables de fidélité vous indiquent comment y parvenir

par

Andrea Madsen

Siva Reddy

Sarath Chandar

Lire l'article

Publications

CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning

Prashant Govindarajan

Mathieu Reymond

Antoine Clavaud

Mariano Phielipp

Santiago Miret

*In silico* design and optimization of new materials primarily relies on high-accuracy atomic simulators that perform density functional the… (voir plus)ory (DFT) calculations. While recent works showcase the strong potential of machine learning to accelerate the material design process, they mostly consist of generative approaches that do not use direct DFT signals as feedback to improve training and generation mainly due to DFT's high computational cost. To aid the adoption of direct DFT signals in the materials design loop through online reinforcement learning (RL), we propose **CrystalGym**, an open-source RL environment for crystalline material discovery. Using CrystalGym, we benchmark value- and policy-based reinforcement learning algorithms for designing various crystals conditioned on target properties. Concretely, we optimize for challenging properties like the band gap, bulk modulus, and density, which are directly calculated from DFT in the environment. While none of the algorithms we benchmark solve all CrystalGym tasks, our extensive experiments and ablations show different sample efficiencies and ease of convergence to optimality for different algorithms and environment settings. Our goal is for CrystalGym to serve as a test bed for reinforcement learning researchers and material scientists to address these real-world design problems with practical applications. Furthermore, we introduce a novel class of challenges for reinforcement learning methods dealing with time-consuming reward signals, paving the way for future interdisciplinary research for machine learning motivated by real-world applications.

2025-03-03

ICLR.cc/2025/Workshop/AI4MAT (spotlight)

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat

Ali Rahimi-Kalahroudi

Mohammad Pezeshki

Pascal Vincent

2025-02-28

ArXiv (prépublication)

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat

Ali Rahimi-Kalahroudi

Mohammad Pezeshki

Pascal Vincent

A key challenge in AI alignment is guiding large language models (LLMs) to follow desired behaviors at test time. Activation steering, which… (voir plus) modifies internal model activations during inference, offers a potential solution. However, prior work in dense activation spaces struggles with superposition, wherein multiple features become entangled, limiting interpretability and precise control. In contrast, sparse representations provide an untapped opportunity for more interpretable behavior modulation. In this work, we introduce sparse activation steering (SAS), a method that leverages sparse autoencoders (SAEs) to steer LLM behavior in sparse spaces. By isolating behavior-specific features through a contrastive prompt-pairing approach, we define a set of features that can selectively reinforce or suppress behaviors. Experiments on Gemma 2 LLMs show that SAS vectors enable nuanced behavioral modulation and finer-grained control. Furthermore, scaling SAEs improves monosemanticity of SAS vectors, suggesting more reliable and interpretable interventions.

2025-02-28

ArXiv (prépublication)

NeoBERT: A Next-Generation BERT

Lola Le Breton

Quentin Fournier

Mariam El Mezouar

Recent innovations in architecture, pre-training, and fine-tuning have led to the remarkable in-context learning and reasoning abilities of … (voir plus)large auto-regressive language models such as LLaMA and DeepSeek. In contrast, encoders like BERT and RoBERTa have not seen the same level of progress despite being foundational for many downstream NLP applications. To bridge this gap, we introduce NeoBERT, a next-generation encoder that redefines the capabilities of bidirectional models by integrating state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. NeoBERT is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an optimal depth-to-width ratio, and leverages an extended context length of 4,096 tokens. Despite its compact 250M parameter footprint, it achieves state-of-the-art results on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions. In addition, we rigorously evaluate the impact of each modification on GLUE and design a uniform fine-tuning and evaluation framework for MTEB. We release all code, data, checkpoints, and training scripts to accelerate research and real-world adoption.

2025-02-26

ArXiv (prépublication)

Sub-goal Distillation: A Method to Improve Small Language Agents

Maryam Hashemzadeh

Elias Stengel-Eskin

Marc-Alexandre Côté

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational req… (voir plus)uirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.

2025-02-17

Proceedings of The 3rd Conference on Lifelong Learning Agents (publié)

A Generalist Hanabi Agent

Arjun V Sudhakar

Hadi Nekoei

Mathieu Reymond

Miao Liu

Janarthanan Rajendran

Traditional multi-agent reinforcement learning (MARL) systems can develop cooperative strategies through repeated interactions. However, the… (voir plus)se systems are unable to perform well on any other setting than the one they have been trained on, and struggle to successfully cooperate with unfamiliar collaborators. This is particularly visible in the Hanabi benchmark, a popular 2-to-5 player cooperative card-game which requires complex reasoning and precise assistance to other agents. Current MARL agents for Hanabi can only learn one specific game-setting (e.g., 2-player games), and play with the same algorithmic agents. This is in stark contrast to humans, who can quickly adjust their strategies to work with unfamiliar partners or situations. In this paper, we introduce Recurrent Replay Relevance Distributed DQN (R3D2), a generalist agent for Hanabi, designed to overcome these limitations. We reformulate the task using text, as language has been shown to improve transfer. We then propose a distributed MARL algorithm that copes with the resulting dynamic observation- and action-space. In doing so, our agent is the first that can play all game settings concurrently, and extend strategies learned from one setting to other ones. As a consequence, our agent also demonstrates the ability to collaborate with different algorithmic agents ---agents that are themselves unable to do so.

2025-01-22

ICLR.cc/2025/Conference (poster)

NeoBERT: A Next-Generation BERT

Lola Le Breton

Quentin Fournier

John Xavier Morris

Mariam El Mezouar

2025-01-01

Trans. Mach. Learn. Res. (publié)

Gintare Karolina Dziugaite

Torque-Aware Momentum

Pranshu Malviya

Goncalo Mordido

Aristide Baratin

Reza Babanezhad Harikandeh

Razvan Pascanu

2024-12-25

ArXiv (prépublication)

Gintare Karolina Dziugaite

Torque-Aware Momentum

Pranshu Malviya

Goncalo Mordido

Aristide Baratin

Reza Babanezhad Harikandeh

Razvan Pascanu

Efficiently exploring complex loss landscapes is key to the performance of deep neural networks. While momentum-based optimizers are widely … (voir plus)used in state-of-the-art setups, classical momentum can still struggle with large, misaligned gradients, leading to oscillations. To address this, we propose Torque-Aware Momentum (TAM), which introduces a damping factor based on the angle between the new gradients and previous momentum, stabilizing the update direction during training. Empirical results show that TAM, which can be combined with both SGD and Adam, enhances exploration, handles distribution shifts more effectively, and improves generalization performance across various tasks, including image classification and large language model fine-tuning, when compared to classical momentum-based optimizers.

2024-12-25

ArXiv (prépublication)

Gintare Karolina Dziugaite

Torque-Aware Momentum

Pranshu Malviya

Goncalo Mordido

Aristide Baratin

Reza Babanezhad Harikandeh

Razvan Pascanu

2024-12-25

ArXiv (prépublication)

Too Big to Fool: Resisting Deception in Language Models

Mohammad Reza Samsami

M. L. Richter

Juan Rodriguez

Megh Thakkar

Maxime Gasse

Large language models must balance their weight-encoded knowledge with in-context information from prompts to generate accurate responses. T… (voir plus)his paper investigates this interplay by analyzing how models of varying capacities within the same family handle intentionally misleading in-context information. Our experiments demonstrate that larger models exhibit higher resilience to deceptive prompts, showcasing an advanced ability to interpret and integrate prompt information with their internal knowledge. Furthermore, we find that larger models outperform smaller ones in following legitimate instructions, indicating that their resilience is not due to disregarding in-context information. We also show that this phenomenon is likely not a result of memorization but stems from the models' ability to better leverage implicit task-relevant information from the prompt alongside their internally stored knowledge.

2024-12-13

ArXiv (prépublication)

Too Big to Fool: Resisting Deception in Language Models

Mohammad Reza Samsami

Mats Leon Richter

Juan A. Rodriguez

Megh Thakkar

Maxime Gasse

2024-12-13

ArXiv (prépublication)