Doina Precup

Sumana Basu

PhD - McGill University

Co-supervisor :

Adriana Romero Soriano

PhD - McGill University

Lynn Cherif

Master's Research - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Principal supervisor :

David Meger

Jonathan Colaço Carr

Master's Research - McGill University

Principal supervisor :

Prakash Panangaden

Élodie Coté-Gauthier

Collaborating researcher - McGill University

Franco Del Balso

Research Intern - Université de Montréal

Jesse Farebrother

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

PhD - McGill University

Principal supervisor :

Eilif Benjamin Muller

PhD - McGill University

Haque Ishfaq

PhD - McGill University

Website

Mohammad Sami Nur Islam Islam

Master's Research - McGill University

Arushi Jain

PhD - McGill University

PhD - McGill University

Postdoctorate - McGill University

Elaine Lau

Master's Research - McGill University

Jonathan Lebensold

Collaborating Alumni - McGill University

Undergraduate - McGill University

Ray Luo

PhD - McGill University

Principal supervisor :

G McCracken

PhD - McGill University

Nazanin Mohammadi Sepahvand

PhD - McGill University

Shahrad Mohammadzadeh

Master's Research - McGill University

Principal supervisor :

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Co-supervisor :

Irina Rish

Padideh Nouri

PhD - Université de Montréal

Co-supervisor :

Charles Onu

PhD - McGill University

PhD - McGill University

Co-supervisor :

Nate Rahn

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

Sahand Rezaei-Shoshtari

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

Blake Richards

samiemandana@gmail.com

PhD - McGill University

Website

Nishanth Anand Vemgal

PhD - McGill University

PhD - McGill University

Priyesh Vijayan

PhD - McGill University

Co-supervisor :

Samira Ebrahimi Kahou

Research Intern - McGill University

Steve Wen

Master's Research - McGill University

Co-supervisor :

Gregory Dudek

Zijing Wu

PhD - McGill University

Co-supervisor :

PhD - McGill University

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

PhD - McGill University

Co-supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Read the article

Publications

Multiple Kernel Learning-Based Transfer Regression for Electric Load Forecasting

Di Wu

Boyu Wang

Benoit Boulet

Electric load forecasting, especially short-term load forecasting (STLF), is becoming more and more important for power system operation. We… (see more) propose to use multiple kernel learning (MKL) for residential electric load forecasting which provides more flexibility than traditional kernel methods. Computation time is an important issue for short-term forecasting, especially for energy scheduling. However, conventional MKL methods usually lead to complicated optimization problems. Another practical issue for this application is that there may be a very limited amount of data available to train a reliable forecasting model for a new house, while at the same time we may have historical data collected from other houses which can be leveraged to improve the prediction performance for the new house. In this paper, we propose a boosting-based framework for MKL regression to deal with the aforementioned issues for STLF. In particular, we first adopt boosting to learn an ensemble of multiple kernel regressors and then extend this framework to the context of transfer learning. Furthermore, we consider two different settings: homogeneous transfer learning and heterogeneous transfer learning. Experimental results on residential data sets demonstrate that forecasting error can be reduced by a large margin with the knowledge learned from other houses.

2020-03-01

IEEE Transactions on Smart Grid (published)

doi.org

Policy Evaluation Networks

Jean Harb

Tom Schaul

Pierre-Luc Bacon

2020-02-26

ArXiv (preprint)

Provably efficient reconstruction of policy networks

Bogdan Mazoure

Thang Doan

Tianyu Li

Recent research has shown that learning poli-cies parametrized by large neural networks can achieve significant success on challenging reinf… (see more)orcement learning problems. However, when memory is limited, it is not always possible to store such models exactly for inference, and com-pressing the policy into a compact representation might be necessary. We propose a general framework for policy representation, which reduces this problem to finding a low-dimensional embedding of a given density function in a separable inner product space. Our framework allows us to de-rive strong theoretical guarantees, controlling the error of the reconstructed policies. Such guaran-tees are typically lacking in black-box models, but are very desirable in risk-sensitive tasks. Our experimental results suggest that the reconstructed policies can use less than 10%of the number of parameters in the original networks, while incurring almost no decrease in rewards.

2020-02-07

ArXiv (preprint)

Representation of Reinforcement Learning Policies in Reproducing Kernel Hilbert Spaces.

Bogdan Mazoure

Thang Doan

Tianyu Li

We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (see more) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly embedded in a low-dimensional space while the embedded policy incurs almost no decrease in return.

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Philip Amortila

Prakash Panangaden

Marc Gendron-Bellemare

2020-01-01

AISTATS (published)

proceedings.mlr.press

On Efficiency in Hierarchical Reinforcement Learning

Zheng Wen

Morteza Ibrahimi

Andre Barreto

Benjamin Van Roy

Satinder Singh

Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efﬁcient solutions to sequential decision making problems, bo… (see more)th in terms of statistical as well as computational efﬁciency. While this has been demonstrated empirically over time in a variety of tasks, theoretical results quantifying the ben-eﬁts of such methods are still few and far between. In this paper, we discuss the kind of structure in a Markov decision process which gives rise to efﬁcient HRL methods. Speciﬁcally, we formalize the intuition that HRL can exploit well repeating "subMDPs", with similar reward and transition structure. We show that, under reasonable assumptions, a model-based Thompson sampling-style HRL algorithm that exploits this structure is statistically efﬁcient, as established through a ﬁnite-time regret bound. We also establish conditions under which planning with structure-induced options is near-optimal and computationally efﬁcient.

An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay

Scott Fujimoto

David Meger

Prioritized Experience Replay (PER) is a deep reinforcement learning technique in which agents learn from transitions sampled with non-unifo… (see more)rm probability proportionate to their temporal-difference error. We show that any loss function evaluated with non-uniformly sampled data can be transformed into another uniformly sampled loss function with the same expected gradient. Surprisingly, we find in some environments PER can be replaced entirely by this new loss function without impact to empirical performance. Furthermore, this relationship suggests a new branch of improvements to PER by correcting its uniformly sampled loss function equivalent. We demonstrate the effectiveness of our proposed modifications to PER and the equivalent loss function in several MuJoCo and Atari environments.

Forethought and Hindsight in Credit Assignment

Veronica Chelu

Hado van Hasselt

We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent … (see more)can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions. Particularly, we work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We establish the relative merits, limitations and complementary properties of both planning mechanisms in carefully constructed scenarios. Further, we investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated. Lastly, we discuss the issue of model estimation and highlight a spectrum of methods that stretch from explicit environment-dynamics predictors to more abstract planner-aware models.

META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation

Mingde Zhao

Sitao Luan

Ian Porada

Xiao-Wen Chang

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that lea… (see more)rn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to boost sample efficiency by temporal credit assignment, i.e. deciding which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter

2020-01-01

AAMAS (published)

dblp.uni-trier.de

Reward Propagation Using Graph Convolutional Networks

Martin Klissarov

Reward Redistribution Mechanisms in Multi-agent Reinforcement Learning

Aly Ibrahim

Anirudha Jitani

Piracha

Daoud

In typical Multi-Agent Reinforcement Learning (MARL) settings, each agent acts to maximize its individual reward objective. However, for col… (see more)lective social welfare maximization, some agents may need to act non-selfishly. We propose a reward shaping mechanism using extrinsic motivation for achieving modularity and increased cooperation among agents in Sequential Social Dilemma (SSD) problems. Our mechanism, inspired by capitalism, provides extrinsic motivation to agents by redistributing a portion of collected re-wards based on each agent’s individual contribution towards team rewards. We demonstrate empirically that this mechanism leads to higher collective welfare relative to existing baselines. Furthermore, this reduces free rider issues and leads to more diverse policies. We evaluate our proposed mechanism for already specialised agents that are pre-trained for specific roles. We show that our mechanism, in the most challenging CleanUp environment, significantly out-performs two baselines (based roughly on socialism and anarchy) and accumulates 2-3 times higher rewards in an easier setting of the environment.

Value-driven Hindsight Modelling

Arthur Guez

Fabio Viola

Theophane Weber

Lars Buesing

Steven Kapturowski