Samira Ebrahimi Kahou

PhD - McGill University

Co-supervisor :

Master's Research - École de technologie suprérieure

Google Scholar

Priyesh Vijayan

PhD - McGill University

Principal supervisor :

Publications

Learning to Play Atari in a World of Tokens

Pranav Agarwal

Sheldon Andrews

Model-based reinforcement learning agents utilizing transformers have shown improved sample efficiency due to their ability to model extende… (see more)d context, resulting in more accurate world models. However, for complex reasoning and planning tasks, these methods primarily rely on continuous representations. This complicates modeling of discrete properties of the real world such as disjoint object classes between which interpolation is not plausible. In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior. We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model. For handling partial observability, we aggregate information from past time steps as memory tokens. DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games. We release our code at https://pranaval.github.io/DART/.

2024-06-03

ArXiv (preprint)

On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization

Jordi Armengol-Estap'e

Vincent Michalski

Ramnath Kumar

Pierre-Luc St-Charles

Doina Precup

Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that cross… (see more)-modal learning can improve representations for few-shot classification. More specifically, language is a rich modality that can be used to guide visual learning. In this work, we experiment with a multi-modal architecture for few-shot learning that consists of three components: a classifier, an auxiliary network, and a bridge network. While the classifier performs the main classification task, the auxiliary network learns to predict language representations from the same input, and the bridge network transforms high-level features of the auxiliary network into modulation parameters for layers of the few-shot classifier using conditional batch normalization. The bridge should encourage a form of lightweight semantic alignment between language and vision which could be useful for the classifier. However, after evaluating the proposed approach on two popular few-shot classification benchmarks we find that a) the improvements do not reproduce across benchmarks, and b) when they do, the improvements are due to the additional compute and parameters introduced by the bridge network. We contribute insights and recommendations for future work in multi-modal meta-learning, especially when using language representations.

2024-05-29

ArXiv (preprint)

Neural semantic tagging for natural language-based search in building information models: Implications for practice

Mehrzad Shahinmoghadam

Ali Motamedi

2024-02-01

Computers in industry (Print) (published)

KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation

Rambod Azimi

Rishav

Marek Teichmann

2024-01-01

ENLSP (published)

Spectral Temporal Contrastive Learning

Sacha Morin

Guy Wolf

Learning useful data representations without requiring labels is a cornerstone of modern deep learning. Self-supervised learning methods, pa… (see more)rticularly contrastive learning (CL), have proven successful by leveraging data augmentations to define positive pairs. This success has prompted a number of theoretical studies to better understand CL and investigate theoretical bounds for downstream linear probing tasks. This work is concerned with the temporal contrastive learning (TCL) setting where the sequential structure of the data is used instead to define positive pairs, which is more commonly used in RL and robotics contexts. In this paper, we adapt recent work on Spectral CL to formulate Spectral Temporal Contrastive Learning (STCL). We discuss a population loss based on a state graph derived from a time-homogeneous reversible Markov chain with uniform stationary distribution. The STCL loss enables to connect the linear probing performance to the spectral properties of the graph, and can be estimated by considering previously observed data sequences as an ensemble of MCMC chains.

2023-12-01

ArXiv (preprint)

Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies

Shiva Kanth Sujit

Pedro Braga

Jorg Bornschein

Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces purely from … (see more)scalar reward signals. A crucial challenge for current deep RL algorithms is that they require a tremendous amount of environment interactions for learning. This can be infeasible in situations where such interactions are expensive, such as in robotics. Offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data without needing to interact with the environment from the very beginning. While online RL algorithms are typically evaluated as a function of the number of environment interactions, there isn't a single established protocol for evaluating offline RL methods. In this paper, we propose a sequential approach to evaluate offline RL algorithms as a function of the training set size and thus by their data efficiency. Sequential evaluation provides valuable insights into the data efficiency of the learning process and the robustness of algorithms to distribution changes in the dataset while also harmonizing the visualization of the offline and online learning phases. Our approach is generally applicable and easy to implement. We compare several existing offline RL algorithms using this approach and present insights from a variety of tasks and offline datasets.

2023-11-10

TMLR (accepted)

Empowering Clinicians with MeDT: A Framework for Sepsis Treatment

Aamer Abdul Rahman

Pranav Agarwal

Vincent Michalski

Rita Noumeir

2023-11-02

NeurIPS.cc/2023/Workshop/GCRL (published)

RelationalUNet for Image Segmentation

Ivaxi Sheth

Pedro H. M. Braga

Shiva Kanth Sujit

Sahar Dastani

2023-10-15

Machine Learning in Medical Imaging (published)

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Shiva Kanth Sujit

Pedro Braga

Fairness Under Demographic Scarce Regime

Patrik Joslin Kenfack

Ulrich Aivodji

Most existing works on fairness assume the model has full access to demographic information. However, there exist scenarios where demographi… (see more)c information is partially available because a record was not maintained throughout data collection or due to privacy reasons. This setting is known as demographic scarce regime. Prior research have shown that training an attribute classifier to replace the missing sensitive attributes (proxy) can still improve fairness. However, the use of proxy-sensitive attributes worsens fairness-accuracy trade-offs compared to true sensitive attributes. To address this limitation, we propose a framework to build attribute classifiers that achieve better fairness-accuracy trade-offs. Our method introduces uncertainty awareness in the attribute classifier and enforces fairness on samples with demographic information inferred with the lowest uncertainty. We show empirically that enforcing fairness constraints on samples with uncertain sensitive attributes is detrimental to fairness and accuracy. Our experiments on two datasets showed that the proposed framework yields models with significantly better fairness-accuracy trade-offs compared to classic attribute classifiers. Surprisingly, our framework outperforms models trained with constraints on the true sensitive attributes.

2023-07-24

ArXiv (preprint)

Transformers in Reinforcement Learning: A Survey

Pranav Agarwal

Aamer Abdul Rahman

Pierre-Luc St-Charles

Simon J. D. Prince

Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve perform… (see more)ance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability. We begin by providing a brief domain overview of RL, followed by a discussion on the challenges of classical RL algorithms. Next, we delve into the properties of the transformer and its variants and discuss the characteristics that make them well-suited to address the challenges inherent in RL. We examine the application of transformers to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization. We also discuss recent research that aims to enhance the interpretability and efficiency of transformers in RL, using visualization techniques and efficient training strategies. Often, the transformer architecture must be tailored to the specific needs of a given application. We present a broad overview of how transformers have been adapted for several applications, including robotics, medicine, language modeling, cloud computing, and combinatorial optimization. We conclude by discussing the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field.

2023-07-12

ArXiv (preprint)

Discovering Object-Centric Generalized Value Functions From Pixels

Gopeshh Subbaraj

Khimya Khetarpal

Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using ha… (see more)nd-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent"question"functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)