Portrait de Pierre-Luc Bacon

Pierre-Luc Bacon

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage par renforcement

Biographie

Pierre-Luc Bacon est professeur agrégé au Département d'informatique et de recherche opérationnelle de l'Université de Montréal. Il est également membre de Mila – Institut québécois d’intelligence artificielle et d’IVADO et titulaire d'une chaire Facebook-CIFAR. Il dirige un groupe de recherche qui travaille sur le défi posé par la malédiction de l'horizon dans l'apprentissage par renforcement et le contrôle optimal.

Étudiants actuels

Collaborateur·rice alumni - UdeM
Collaborateur·rice alumni - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Maîtrise recherche - UdeM
Collaborateur·rice alumni - UdeM
Stagiaire de recherche - UdeM
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM

Publications

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé
Pierluca D'Oro
Evgenii Nikishin
Aristide Baratin
Mol-MoE: Training Preference-Guided Routers for Molecule Generation
Diego Calanzone
Pierluca D'Oro
Recent advances in language models have enabled framing molecule generation as sequence modeling. However, existing approaches often rely on… (voir plus) single-objective reinforcement learning, limiting their applicability to real-world drug design, where multiple competing properties must be optimized. Traditional multi-objective reinforcement learning (MORL) methods require costly retraining for each new objective combination, making rapid exploration of trade-offs impractical. To overcome these limitations, we introduce Mol-MoE, a mixture-of-experts (MoE) architecture that enables efficient test-time steering of molecule generation without retraining. Central to our approach is a preference-based router training objective that incentivizes the router to combine experts in a way that aligns with user-specified trade-offs. This provides improved flexibility in exploring the chemical property space at test time, facilitating rapid trade-off exploration. Benchmarking against state-of-the-art methods, we show that Mol-MoE achieves superior sample quality and steerability.
Mol-MoE: Training Preference-Guided Routers for Molecule Generation
Diego Calanzone
Pierluca D'Oro
Recent advances in language models have enabled framing molecule generation as sequence modeling. However, existing approaches often rely on… (voir plus) single-objective reinforcement learning, limiting their applicability to real-world drug design, where multiple competing properties must be optimized. Traditional multi-objective reinforcement learning (MORL) methods require costly retraining for each new objective combination, making rapid exploration of trade-offs impractical. To overcome these limitations, we introduce Mol-MoE, a mixture-of-experts (MoE) architecture that enables efficient test-time steering of molecule generation without retraining. Central to our approach is a preference-based router training objective that incentivizes the router to combine experts in a way that aligns with user-specified trade-offs. This provides improved flexibility in exploring the chemical property space at test time, facilitating rapid trade-off exploration. Benchmarking against state-of-the-art methods, we show that Mol-MoE achieves superior sample quality and steerability.
MaestroMotif: Skill Design from Artificial Intelligence Feedback
Martin Klissarov
Mikael Henaff
Roberta Raileanu
Shagun Sodhani
Amy Zhang
Marlos C. Machado
Pierluca D'Oro
Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an… (voir plus) AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM's feedback to automatically design rewards corresponding to each skill, starting from their natural language description. Then, it employs an LLM's code generation abilities, together with reinforcement learning, for training the skills and combining them to implement complex behaviors specified in language. We evaluate MaestroMotif using a suite of complex tasks in the NetHack Learning Environment (NLE), demonstrating that it surpasses existing approaches in both performance and usability.
MaestroMotif: Skill Design from Artificial Intelligence Feedback
Martin Klissarov
Mikael Henaff
Roberta Raileanu
Shagun Sodhani
Amy Zhang
Marlos C. Machado
Pierluca D'Oro
Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an… (voir plus) AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM's feedback to automatically design rewards corresponding to each skill, starting from their natural language description. Then, it employs an LLM's code generation abilities, together with reinforcement learning, for training the skills and combining them to implement complex behaviors specified in language. We evaluate MaestroMotif using a suite of complex tasks in the NetHack Learning Environment (NLE), demonstrating that it surpasses existing approaches in both performance and usability.
MaestroMotif: Skill Design from Artificial Intelligence Feedback
Martin Klissarov
Mikael Henaff
Roberta Raileanu
Shagun Sodhani
Amy Zhang
Marlos C. Machado
Pierluca D'Oro
Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an… (voir plus) AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM's feedback to automatically design rewards corresponding to each skill, starting from their natural language description. Then, it employs an LLM's code generation abilities, together with reinforcement learning, for training the skills and combining them to implement complex behaviors specified in language. We evaluate MaestroMotif using a suite of complex tasks in the NetHack Learning Environment (NLE), demonstrating that it surpasses existing approaches in both performance and usability.
Neural differential equations for temperature control in buildings under demand response programs
Vincent Taboga
Clement Gehring
Mathieu Le Cam
Neural differential equations for temperature control in buildings under demand response programs
Vincent Taboga
Clement Gehring
Mathieu Le Cam
Effects of Scale on Language Model Robustness
Nikolaus H. R. Howe
Ian R. McKenzie
Oskar John Hollinsworth
Michał Zając
Tom Tseng
Aaron David Tucker
Adam Gleave
Language models exhibit scaling laws, whereby increasing model and dataset size yields predictable decreases in negative log likelihood, unl… (voir plus)ocking a dazzling array of capabilities. This phenomenon spurs many companies to train ever larger models in pursuit of ever improved performance. Yet, these models are vulnerable to adversarial inputs such as ``jailbreaks'' and prompt injections that induce models to perform undesired behaviors, posing a growing risk as models become more capable. Prior work indicates that computer vision models become more robust with model and data scaling, raising the question: does language model robustness also improve with scale? We study this question empirically in the classification setting, finding that without explicit defense training, larger models tend to be modestly more robust on most tasks, though the effect is not reliable. Even with the advantage conferred by scale, undefended models remain easy to attack in absolute terms, and we thus turn our attention to explicitly training models for adversarial robustness, which we show to be a much more compute-efficient defense than scaling model size alone. In this setting, we also observe that adversarially trained larger models generalize faster and better to modified attacks not seen during training when compared with smaller models. Finally, we analyze the offense/defense balance of increasing compute, finding parity in some settings and an advantage for offense in others, suggesting that adversarial training alone is not sufficient to solve robustness, even at greater model scales.
Scaling Trends in Language Model Robustness
Nikolaus H. R. Howe
Ian R. McKenzie
Oskar John Hollinsworth
Michał Zając
Tom Tseng
Aaron David Tucker
Adam Gleave
Do Transformer World Models Give Better Policy Gradients?
Michel Ma
Tianwei Ni
Clement Gehring
Pierluca D'Oro
Exploring Scaling Trends in LLM Robustness
Nikolaus H. R. Howe
Michał Zając
Ian R. McKenzie
Oskar John Hollinsworth
Tom Tseng
Aaron David Tucker
Adam Gleave
Language model capabilities predictably improve from scaling a model's size and training data. Motivated by this, increasingly large languag… (voir plus)e models have been trained, yielding an array of impressive capabilities. Yet these models are vulnerable to adversarial prompts, such as"jailbreaks"that hijack models to perform undesired behaviors, posing a significant risk of misuse. Prior work indicates that computer vision models become more robust with model and data scaling, raising the question: does language model robustness also improve with scale? We study this question empirically, finding that larger models respond substantially better to adversarial training, but there is little to no benefit from model scale in the absence of explicit defenses.