Doina Precup

Sumana Basu

PhD - McGill University

Co-supervisor :

Adriana Romero Soriano

PhD - McGill University

Lynn Cherif

Master's Research - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Principal supervisor :

David Meger

Jonathan Colaço Carr

Master's Research - McGill University

Principal supervisor :

Prakash Panangaden

Élodie Coté-Gauthier

Collaborating researcher - McGill University

Franco Del Balso

Research Intern - Université de Montréal

Jesse Farebrother

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

PhD - McGill University

Principal supervisor :

Eilif Benjamin Muller

PhD - McGill University

Haque Ishfaq

PhD - McGill University

Website

Mohammad Sami Nur Islam Islam

Master's Research - McGill University

Arushi Jain

PhD - McGill University

PhD - McGill University

Postdoctorate - McGill University

Elaine Lau

Master's Research - McGill University

Jonathan Lebensold

Collaborating Alumni - McGill University

Undergraduate - McGill University

Ray Luo

PhD - McGill University

Principal supervisor :

G McCracken

PhD - McGill University

Nazanin Mohammadi Sepahvand

PhD - McGill University

Shahrad Mohammadzadeh

Master's Research - McGill University

Principal supervisor :

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Co-supervisor :

Irina Rish

Padideh Nouri

PhD - Université de Montréal

Co-supervisor :

Charles Onu

PhD - McGill University

PhD - McGill University

Co-supervisor :

Nate Rahn

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

Sahand Rezaei-Shoshtari

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

Blake Richards

samiemandana@gmail.com

PhD - McGill University

Website

Nishanth Anand Vemgal

PhD - McGill University

PhD - McGill University

Priyesh Vijayan

PhD - McGill University

Co-supervisor :

Samira Ebrahimi Kahou

Research Intern - McGill University

Steve Wen

Master's Research - McGill University

Co-supervisor :

Gregory Dudek

Zijing Wu

PhD - McGill University

Co-supervisor :

PhD - McGill University

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

PhD - McGill University

Co-supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Read the article

Publications

Preferential Temporal Difference Learning

Nishanth Anand

2021-01-01

ICML (published)

Randomized Exploration in Reinforcement Learning with General Value Function Approximation

Haque Ishfaq

Qiwen Cui

Viet Bang Nguyen

Alex Ayoub

Zhuoran Yang

Zhaoran Wang

Lin Yang

2021-01-01

International Conference on Machine Learning (published)

Randomized Least Squares Policy Optimization

Haque Ishfaq

Zhuoran Yang

Andrei-Stefan Lupu

Viet Bang Nguyen

Lewis Liu

Riashat Islam

Zhaoran Wang

Policy Optimization (PO) methods with function approximation are one of the most popular classes of Reinforcement Learning (RL) algorithms. … (see more)However, designing provably efﬁcient policy optimization algorithms remains a challenge. Recent work in this area has focused on incorporating upper conﬁdence bound (UCB)-style bonuses to drive exploration in policy optimization. In this paper, we present Randomized Least Squares Policy Optimization (RLSPO) which is inspired by Thompson Sampling. We prove that, in an episodic linear kernel MDP setting, RLSPO achieves (cid:101) O ( d 3 / 2 H 3 / 2 √ T ) worst-case (frequentist) regret, where H is the number of episodes, T is the total number of steps and d is the feature dimension. Finally, we evaluate RLSPO empirically and show that it is competitive with existing provably efﬁcient PO algorithms.

Temporally Abstract Partial Models

Khimya Khetarpal

Zafarali Ahmed

Gheorghe Comanici

Humans and animals have the ability to reason and make predictions about different courses of action at many time scales. In reinforcement l… (see more)earning, option models (Sutton, Precup \& Singh, 1999; Precup, 2000) provide the framework for this kind of temporally abstract prediction and reasoning. Natural intelligent agents are also able to focus their attention on courses of action that are relevant or feasible in a given situation, sometimes termed affordable actions. In this paper, we define a notion of affordances for options, and develop temporally abstract partial option models, that take into account the fact that an option might be affordable only in certain situations. We analyze the trade-offs between estimation and approximation error in planning and learning when using such models, and identify some interesting special cases. Additionally, we empirically demonstrate the ability to learn both affordances and partial option models online resulting in improved sample efficiency and planning time in the Taxi domain.

openreview.net

On the Expressivity of Markov Reward

David Abel

Will Dabney

Anna Harutyunyan

Mark K. Ho

Michael L. Littman

Satinder Singh

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way … (see more)to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of"task"that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

openreview.net

Where Did You Learn That From? Surprising Effectiveness of Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning

Maziar Gomrokchi

Susan Amin

Hossein Aboutalebi

Alexander Wong

2021-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Phylogenetic Manifold Regularization: A semi-supervised approach to predict transcription factor binding sites

Faizy Ahsan

Alexandre Drouin

Franccois Laviolette

Mathieu Blanchette

The computational prediction of transcription factor binding sites remains a challenging problems in bioinformatics, despite significant met… (see more)hodological developments from the field of machine learning. Such computational models are essential to help interpret the non-coding portion of human genomes, and to learn more about the regulatory mechanisms controlling gene expression. In parallel, massive genome sequencing efforts have produced assembled genomes for hundred of vertebrate species, but this data is underused. We present PhyloReg, a new semi-supervised learning approach that can be used for a wide variety of sequence-to-function prediction problems, and that takes advantage of hundreds of millions of years of evolution to regularize predictors and improve accuracy. We demonstrate that PhyloReg can be used to better train a previously proposed deep learning model of transcription factor binding. Simulation studies further help delineate the benefits of the a pproach. G ains in prediction accuracy are obtained over a broad set of transcription factors and cell types.

2020-12-16

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (published)

doi.org

Interference and Generalization in Temporal Difference Learning

Emmanuel Bengio

Joelle Pineau

We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product… (see more) of two different gradients, representing their alignment. This quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD(

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (published)

Invariant Causal Prediction for Block MDPs

Amy Zhang

Clare Lyle

Shagun Sodhani

Angelos Filos

Marta Z. Kwiatkowska

Joelle Pineau

Yarin Gal

Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. … (see more)In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (published)

What can I do here? A Theory of Affordances in Reinforcement Learning

Khimya Khetarpal

Zafarali Ahmed

Gheorghe Comanici

David Abel

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (published)

Diversity-Enriched Option-Critic

Anand Kamat

Temporal abstraction allows reinforcement learning agents to represent knowledge and develop strategies over different temporal scales. The … (see more)option-critic framework has been demonstrated to learn temporally extended actions, represented as options, end-to-end in a model-free setting. However, feasibility of option-critic remains limited due to two major challenges, multiple options adopting very similar behavior, or a shrinking set of task relevant options. These occurrences not only void the need for temporal abstraction, they also affect performance. In this paper, we tackle these problems by learning a diverse set of options. We introduce an information-theoretic intrinsic reward, which augments the task reward, as well as a novel termination objective, in order to encourage behavioral diversity in the option set. We show empirically that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks, outperforms option-critic by a wide margin. Furthermore, we show that our approach sustainably generates robust, reusable, reliable and interpretable options, in contrast to option-critic.

2020-11-04

ArXiv (preprint)

A Study of Policy Gradient on a Class of Exactly Solvable Models

Gavin McCracken

Colin Daniels

Rosie Zhao

Anna M. Brandenberger

Prakash Panangaden

Policy gradient methods are extensively used in reinforcement learning as a way to optimize expected return. In this paper, we explore the e… (see more)volution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain, whose transition probabilities are determined by the gradient of the distribution of the policy's value. Our approach relies heavily on random walk theory, specifically on affine Weyl groups. We construct a class of novel partially observable environments with controllable exploration difficulty, in which the value distribution, and hence the policy parameter evolution, can be derived analytically. Using these environments, we analyze the probabilistic convergence of policy gradient to different local maxima of the value function. To our knowledge, this is the first approach developed to analytically compute the landscape of policy gradient in POMDPs for a class of such environments, leading to interesting insights into the difficulty of this problem.

2020-11-03

ArXiv (preprint)