Portrait de Pierre-Luc Bacon

Pierre-Luc Bacon

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage par renforcement

Biographie

Pierre-Luc Bacon est professeur agrégé au Département d'informatique et de recherche opérationnelle de l'Université de Montréal. Il est également membre de Mila – Institut québécois d’intelligence artificielle et d’IVADO et titulaire d'une chaire Facebook-CIFAR. Il dirige un groupe de recherche qui travaille sur le défi posé par la malédiction de l'horizon dans l'apprentissage par renforcement et le contrôle optimal.

Étudiants actuels

Stagiaire de recherche - UdeM
Collaborateur·rice alumni - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Maîtrise recherche - Polytechnique
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Collaborateur·rice alumni - UdeM
Stagiaire de recherche - UdeM
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM

Publications

Exploring Scaling Trends in LLM Robustness
Nikolaus H. R. Howe
Michał Zając
Ian R. McKenzie
Oskar John Hollinsworth
Tom Tseng
Adam Gleave
Language model capabilities predictably improve from scaling the model’s size and training data. Motivated by this, increasingly large lan… (voir plus)guage models have been trained, yielding an array of impressive capabilities. Yet these models suffer from adversarial prompts such as “jailbreaks” that hijack models to perform undesired behavior, posing a significant risk of misuse. Prior work has found that computer vision models become more robust with model and data scaling, raising the question: does language model robustness also improve with scale? We study this question empirically, finding that larger models respond substantially more effectively to adversarial training, but there is little to no benefit from model scale in the absence of defenses.
Maximum entropy GFlowNets with soft Q-learning
Block-State Transformers
Jonathan Pilault
Mahan Fathi
Orhan Firat
Double Gumbel Q-Learning.
David Yu-Tung Hui
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Nathan Rahn
Pierluca D'Oro
Harley Wiltzer
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In th… (voir plus)is work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.
When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment
Tianwei Ni
Michel Ma
Benjamin Eysenbach
Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, an… (voir plus)d determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the underlying reason for the strong performance of Transformer-based RL methods remains unclear: is it because they learn effective memory, or because they perform effective credit assignment? After introducing formal definitions of memory length and credit assignment length, we design simple configurable tasks to measure these distinct quantities. Our empirical results reveal that Transformers can enhance the memory capability of RL algorithms, scaling up to tasks that require memorizing observations
Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design
In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound f… (voir plus)or pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn the multi-objective problem into a preference-conditioned single objective, it has been established that this kind of reduction may produce solutions that tend to slide towards the extreme points of the objective space when presented with a problem that exhibits a concave Pareto front. In this work we experiment with an alternative formulation of goal-conditioned molecular generation to obtain a more controllable conditional model that can uniformly explore solutions along the entire Pareto front.
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long… (voir plus) sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat