Mila > Team > Pierre-Luc Bacon

Pierre-Luc Bacon

Core Academic Member
Assistant Professor, Université de Montréal, Facebook CIFAR AI Chair

My research is at the intersection of reinforcement learning, optimal control and optimization. I’m interested in developing novel algorithms and to challenge our theoretical understanding of reinforcement learning in the real world. My group works on end-to-end model-based reinforcement learning methods, meta-learning, and optimal control in continuous time.

Publications

2021-12

Neural Algorithmic Reasoners are Implicit Planners
Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic, Pierre-Luc Bacon, Jian Tang and Mladen Nikolic

2021-06

Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation.
Evgenii Nikishin, Romina Abachi, Rishabh Agarwal and Pierre-Luc Bacon
arXiv preprint arXiv:2106.03273
(2021-06-06)
ui.adsabs.harvard.eduPDF

2021-05

XLVIN: eXecuted Latent Value Iteration Nets
Andreea Deac, Petar Veličković, Ognjen Milinkovic, Pierre-Luc Bacon, Jian Tang and Mladen Nikolic
arXiv e-prints
(2021-05-04)
ui.adsabs.harvard.eduPDF

2021-03

An Information-Theoretic Perspective on Credit Assignment in Reinforcement Learning.
Dilip Arumugam, Peter Henderson and Pierre-Luc Bacon
arXiv preprint arXiv:2103.06224
(2021-03-10)
ui.adsabs.harvard.eduPDF

2021-01

TDprop: Does Adaptive Optimization With Jacobi Preconditioning Help Temporal Difference Learning?
Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon and Joelle Pineau
Autonomous Agents and Multi-Agent Systems
(2021-01-01)
dl.acm.org

2020-09

Graph neural induction of value iteration.
Andreea Deac, Pierre-Luc Bacon and Jian Tang
arXiv preprint arXiv:2009.12604
(2020-09-26)
ui.adsabs.harvard.eduPDF

2020-07

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Yao Liu, Pierre-Luc Bacon and Emma Brunskill
ICML 2020
(2020-07-12)
proceedings.mlr.pressPDF
TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?
Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon and Joelle Pineau
arXiv preprint arXiv:2007.02786
(2020-07-06)
ui.adsabs.harvard.eduPDF

2020-04

Options of Interest: Temporal Abstraction with Interest Functions
Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon and Doina Precup

2020-02

Policy Evaluation Networks.
Jean Harb, Tom Schaul, Doina Precup and Pierre-Luc Bacon
arXiv preprint arXiv:2002.11833
(2020-02-26)
ui.adsabs.harvard.eduPDF

2019-12

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods
Riashat Islam, Raihan Seraj, Pierre-Luc Bacon and Doina Precup
arXiv preprint arXiv:1912.05104
(2019-12-11)
ui.adsabs.harvard.eduPDF

2018-11

The Barbados 2018 List of Open Issues in Continual Learning.
Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc G. Bellemare and Doina Precup
arXiv preprint arXiv:1811.07004
(2018-11-16)
ui.adsabs.harvard.eduPDF

2018-07

Convergent Tree Backup and Retrace with Function Approximation
ICML 2018
(2018-07-03)
proceedings.mlr.pressPDF

2018-03

Constructing Temporal Abstractions Autonomously in Reinforcement Learning
Ai Magazine
(2018-03-27)
doi.org

2018-02

Learning with Options that Terminate Off-Policy
Anna Harutyunyan, Peter Vrancx, Pierre-luc Bacon, Doina Precup and Ann Nowe
AAAI 2018
(2018-02-07)
ui.adsabs.harvard.eduPDF
When Waiting is not an Option : Learning Options with a Deliberation Cost
Jean Harb, Pierre-luc Bacon, Martin Klissarov and Doina Precup
AAAI 2018
(2018-02-07)
dblp.uni-trier.dePDF
OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning
Peter Henderson, Wei-Di Chang, Pierre-luc Bacon, David Meger, Joelle Pineau and Doina Precup
AAAI 2018
(2018-02-07)
ui.adsabs.harvard.eduPDF

2018-01

Learning Robust Options.
Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup and Shie Mannor

Publications collected and formatted using Paperoni