Doina Precup

Jesse Farebrother

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Doctorat - McGill

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - Birla Institute of Technology

Jonathan Hu

Maîtrise recherche - McGill

Howard Huang

Doctorat - McGill

Haque Ishfaq

Collaborateur·rice alumni - McGill

Site web

Mohammad Sami Nur Islam Islam

Maîtrise recherche - McGill

Hangzhan Jin

Doctorat - Polytechnique

Doctorat - McGill

Postdoctorat - McGill

Jonathan Lebensold

Collaborateur·rice alumni - McGill

Collaborateur·rice alumni - McGill

Ray Luo

Doctorat - McGill

Superviseur⋅e principal⋅e :

G McCracken

Doctorat - McGill

Nazanin Mohammadi Sepahvand

Collaborateur·rice alumni - McGill

Shahrad Mohammadzadeh

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Gabriela Moisescu-Pareja

Collaborateur·rice de recherche - McGill

Co-superviseur⋅e :

Irina Rish

Padideh Nouri

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Stagiaire de recherche - McGill

Nate Rahn

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Manoosh Samiei

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Nishanth Anand Vemgal

Doctorat - McGill

Doctorat - McGill

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Stagiaire de recherche - McGill

Zihan Wang

Doctorat - McGill

Site web

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Steve Wen

Maîtrise recherche - McGill

Co-superviseur⋅e :

Gregory Dudek

Zijing Wu

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Harry Zhao

Collaborateur·rice alumni - McGill

Co-superviseur⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Lire l'article

Publications

On the Expressivity of Markov Reward

David Abel

Will Dabney

Anna Harutyunyan

Mark K. Ho

Michael L. Littman

Satinder Singh

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way … (voir plus)to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of"task"that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

2020-12-31

Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (publié)

openreview.net

Where Did You Learn That From? Surprising Effectiveness of Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning

Alexander Wong

2020-12-31

arXiv.org (prépublication)

dblp.uni-trier.de

Phylogenetic Manifold Regularization: A semi-supervised approach to predict transcription factor binding sites

Francois Laviolette

The computational prediction of transcription factor binding sites remains a challenging problems in bioinformatics, despite significant m e… (voir plus)thodological d evelopments f rom t he field of machine learning. Such computational models are essential to help interpret the non-coding portion of human genomes, and to learn more about the regulatory mechanisms controlling gene expression. In parallel, massive genome sequencing efforts have produced assembled genomes for hundred of vertebrate species, but this data is underused. We present PhyloReg, a new semi-supervised learning approach that can be used for a wide variety of sequence-to-function prediction problems, and that takes advantage of hundreds of millions of years of evolution to regularize predictors and improve accuracy. We demonstrate that PhyloReg can be used to better train a previously proposed deep learning model of transcription factor binding. Simulation studies further help delineate the benefits o f t he a pproach. G ains in prediction accuracy are obtained over a broad set of transcription factors and cell types.

2020-12-15

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (publié)

Interference and Generalization in Temporal Difference Learning

Emmanuel Bengio

Joelle Pineau

We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product… (voir plus) of two different gradients, representing their alignment. This quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD(

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

proceedings.mlr.press

Invariant Causal Prediction for Block MDPS

Amy Zhang

Clare Lyle

Shagun Sodhani

Angelos Filos

Marta Kwiatkowska

Joelle Pineau

Yarin Gal

Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. … (voir plus)In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

proceedings.mlr.press

What can I do here? A Theory of Affordances in Reinforcement Learning

Khimya Khetarpal

Zafarali Ahmed

Gheorghe Comanici

David Abel

Reinforcement learning algorithms usually assume that all actions are always available to an agent. However, both people and animals underst… (voir plus)and the general link between the features of their environment and the actions that are feasible. Gibson (1977) coined the term "affordances" to describe the fact that certain states enable an agent to do certain actions, in the context of embodied agents. In this paper, we develop a theory of affordances for agents who learn and plan in Markov Decision Processes. Affordances play a dual role in this case. On one hand, they allow faster planning, by reducing the number of actions available in any given situation. On the other hand, they facilitate more efficient and precise learning of transition models from data, especially when such models require function approximation. We establish these properties through theoretical results as well as illustrative examples. We also propose an approach to learn affordances and use it to estimate transition models that are simpler and generalize better.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

proceedings.mlr.press

Diversity-Enriched Option-Critic

Anand Kamat

Temporal abstraction allows reinforcement learning agents to represent knowledge and develop strategies over different temporal scales. The … (voir plus)option-critic framework has been demonstrated to learn temporally extended actions, represented as options, end-to-end in a model-free setting. However, feasibility of option-critic remains limited due to two major challenges, multiple options adopting very similar behavior, or a shrinking set of task relevant options. These occurrences not only void the need for temporal abstraction, they also affect performance. In this paper, we tackle these problems by learning a diverse set of options. We introduce an information-theoretic intrinsic reward, which augments the task reward, as well as a novel termination objective, in order to encourage behavioral diversity in the option set. We show empirically that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks, outperforms option-critic by a wide margin. Furthermore, we show that our approach sustainably generates robust, reusable, reliable and interpretable options, in contrast to option-critic.

2020-11-03

ArXiv (prépublication)

arxiv.org

A Study of Policy Gradient on a Class of Exactly Solvable Models

Gavin McCracken

Colin Daniels

Rosie Zhao

Anna M. Brandenberger

Prakash Panangaden

Policy gradient methods are extensively used in reinforcement learning as a way to optimize expected return. In this paper, we explore the e… (voir plus)volution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain, whose transition probabilities are determined by the gradient of the distribution of the policy's value. Our approach relies heavily on random walk theory, specifically on affine Weyl groups. We construct a class of novel partially observable environments with controllable exploration difficulty, in which the value distribution, and hence the policy parameter evolution, can be derived analytically. Using these environments, we analyze the probabilistic convergence of policy gradient to different local maxima of the value function. To our knowledge, this is the first approach developed to analytically compute the landscape of policy gradient in POMDPs for a class of such environments, leading to interesting insights into the difficulty of this problem.

2020-11-02

ArXiv (prépublication)

arxiv.org

A Fully Tensorized Recurrent Neural Network

Charles Onu

Jacob Miller

2020-10-07

ArXiv (prépublication)

arxiv.org

Keynote Lecture - Building Knowledge For AI AgentsWith Reinforcement Learning

Summary form only given, as follows. The complete presentation was not made available for publication as part of the conference proceedings.… (voir plus) Reinforcement learning allows autonomous agents to learn how to act in a stochastic, unknown environment, with which they can interact. Deep reinforcement learning, in particular, has achieved great success in well-defined application domains, such as Go or chess, in which an agent has to learn how to act and there is a clear success criterion. In this talk, I will focus on the potential role of reinforcement learning as a tool for building knowledge representations in AI agents whose goal is to perform continual learning. I will examine a key concept in reinforcement learning, the value function, and discuss its generalization to support various forms of predictive knowledge. I will also discuss the role of temporally extended actions, and their associated predictive models, in learning procedural knowledge. In order to tame the possible complexity of learning knowledge representations, reinforcement learning agents can use the concepts of intents (ie intended consequences of courses of actions) and affordances (which capture knowlege about where actions can be applied). Finally, I will discuss the challenge of how to evaluate reinforcement learning agents whose goal is not just to control their environment, but also to build knowledge about their world.

2020-09-02

International Conference on Computational Photography (publié)

Fast reinforcement learning with generalized policy updates

Andre Barreto

Shaobo Hou

Diana Borsa

David Silver

The combination of reinforcement learning with deep learning is a promising approach to tackle important sequential decision-making problems… (voir plus) that are currently intractable. One obstacle to overcome is the amount of data needed by learning systems of this type. In this article, we propose to address this issue through a divide-and-conquer approach. We argue that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel. By associating each task with a reward function, this problem decomposition can be seamlessly accommodated within the standard reinforcement-learning formalism. The specific way we do so is through a generalization of two fundamental operations in reinforcement learning: policy improvement and policy evaluation. The generalized version of these operations allow one to leverage the solution of some tasks to speed up the solution of others. If the reward function of a task can be well approximated as a linear combination of the reward functions of tasks previously solved, we can reduce a reinforcement-learning problem to a simpler linear regression. When this is not the case, the agent can still exploit the task solutions by using them to interact with and learn about the environment. Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.

2020-08-16

Proceedings of the National Academy of Sciences of the United States of America (publié)

A Brief Look at Generalization in Visual Meta-Reinforcement Learning

Safa Alver