Pierre-Luc Bacon

Biographie

Pierre-Luc Bacon est professeur agrégé au Département d'informatique et de recherche opérationnelle de l'Université de Montréal. Il est également membre de Mila – Institut québécois d’intelligence artificielle et d’IVADO et titulaire d'une chaire Facebook-CIFAR. Il dirige un groupe de recherche qui travaille sur le défi posé par la malédiction de l'horizon dans l'apprentissage par renforcement et le contrôle optimal.

Étudiants actuels

Doctorat - UdeM

Kamen Damov

Maîtrise professionnelle - UdeM

Esther Derman

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Maîtrise recherche - Polytechnique

Superviseur⋅e principal⋅e :

Hanane Dagdougui

Arielle Gazzé

Maîtrise recherche - UdeM

Adrien Goldszal

Collaborateur·rice alumni - UdeM

Site web

Niki Howe

Doctorat - UdeM

Doctorat - UdeM

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e :

Yashar Hezaveh

Lu Li

Doctorat - UdeM

Michel Ma

Doctorat - UdeM

Vamsi Krishna Munjuluri V S

Artiom Matvei

Maîtrise recherche - UdeM

Aneri Muni

Doctorat - UdeM

Maîtrise recherche - UdeM

Tianwei Ni

Doctorat - UdeM

Collaborateur·rice alumni - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

Postdoctorat - UdeM

Yihao Sun

Doctorat - UdeM

Postdoctorat - UdeM

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Billets de blogue

7 août 2024

Équations différentielles neuronales pour la régulation de la température dans les bâtiments dans le cadre de programmes de réponse à la demande

par

Vincent Taboga

Clement Gehring

Mathieu Le Cam

Hanane Dagdougui

Pierre-Luc Bacon

Lire l'article

Direct Behavior Specification via Constrained Reinforcement Learning

31 août 2022

Spécification directe du comportement par apprentissage par renforcement sous contrainte

par

Julien Roy

Roger Girgis

Joshua Romoff

Pierre-Luc Bacon

Chris Pal

Lire l'article

Publications

Block-State Transformers

2023-09-21

NeurIPS.cc/2023/Conference (poster)

Double Gumbel Q-Learning.

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Nathan Rahn

Pierluca D'Oro

Harley Wiltzer

Pierre-Luc Bacon

Marc Gendron-Bellemare

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In th… (voir plus)is work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

Tianwei Ni

Michel Ma

Benjamin Eysenbach

Pierre-Luc Bacon

Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, an… (voir plus)d determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the underlying reason for the strong performance of Transformer-based RL methods remains unclear: is it because they learn effective memory, or because they perform effective credit assignment? After introducing formal definitions of memory length and credit assignment length, we design simple configurable tasks to measure these distinct quantities. Our empirical results reveal that Transformers can enhance the memory capability of RL algorithms, scaling up to tasks that require memorizing observations

Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design

In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound f… (voir plus)or pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn the multi-objective problem into a preference-conditioned single objective, it has been established that this kind of reduction may produce solutions that tend to slide towards the extreme points of the objective space when presented with a problem that exhibits a concave Pareto front. In this work we experiment with an alternative formulation of goal-conditioned molecular generation to obtain a more controllable conditional model that can uniformly explore solutions along the entire Pareto front.

2023-06-23

ICML.cc/2023/Workshop/DeployableGenerativeAI (publié)

doi.org

Block-State Transformers

State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long… (voir plus) sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.

2023-06-15

ArXiv (prépublication)

Block-State Transformers

2023-06-15

ArXiv (prépublication)

Block-State Transformers

2023-06-15

ArXiv (prépublication)

Block-State Transformers

2023-06-15

ArXiv (prépublication)

Block-State Transformers

2023-06-15

ArXiv (prépublication)

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

Marc Gendron-Bellemare

Aaron Courville

Increasing the replay ratio, the number of updates of an agent's parameters per environment interaction, is an appealing strategy for improv… (voir plus)ing the sample efficiency of deep reinforcement learning algorithms. In this work, we show that fully or partially resetting the parameters of deep reinforcement learning agents causes better replay ratio scaling capabilities to emerge. We push the limits of the sample efficiency of carefully-modified algorithms by training them using an order of magnitude more updates than usual, significantly improving their performance in the Atari 100k and DeepMind Control Suite benchmarks. We then provide an analysis of the design choices required for favorable replay ratio scaling to be possible and discuss inherent limits and tradeoffs.

2023-02-01

ICLR.cc/2023/Conference (notable)