Pierre-Luc Bacon

diego.calanzone@mila.quebec

Diego Calanzone

Collaborating researcher - University of Trento

Postdoctorate - Université de Montréal

Co-supervisor :

Érick Delage

esther.derman@mila.quebec

Evgenii Nikishin

PhD - Université de Montréal

Co-supervisor :

Aaron Courville

evgenii.nikishin@mila.quebec

Website

justin.veilleux@mila.quebec

Julien Roy

PhD - Polytechnique Montréal

Principal supervisor :

Research Intern - Université de Montréal

Léa Côté-Turcotte

Master's Research - Université de Montréal

lea.cote-turcotte@mila.quebec

Mahan Fathi

Master's Research - Université de Montréal

mahan.fathi@mila.quebec

Website

Michel Ma

PhD - Université de Montréal

michel.ma@mila.quebec

pierluca.doro@mila.quebec

Niki Howe

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Marc Gendron-Bellemare

Website

Samy Rasmy

Research Intern - Université de Montréal

samy.rasmy@mila.quebec

sobhan.mohammadpour@mila.quebec

Sobhan Mohammadpour

Master's Research - Université de Montréal

Tianwei Ni

PhD - Université de Montréal

tianwei.ni@mila.quebec

PhD - Polytechnique Montréal

Principal supervisor :

Hanane Dagdougui

vincent.taboga@mila.quebec

Direct Behavior Specification via Constrained Reinforcement Learning

Blog Posts

August 31, 2022

Julien Roy

Roger Girgis

Joshua Romoff

Pierre-Luc Bacon

Chris Pal

Read the article

Publications

Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design

Julien Roy

Chris Pal

Emmanuel Bengio

In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound f… (see more)or pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn the multi-objective problem into a preference-conditioned single objective, it has been established that this kind of reduction may produce solutions that tend to slide towards the extreme points of the objective space when presented with a problem that exhibits a concave Pareto front. In this work we experiment with an alternative formulation of goal-conditioned molecular generation to obtain a more controllable conditional model that can uniformly explore solutions along the entire Pareto front.

2023-06-23

ICML.cc/2023/Workshop/DeployableGenerativeAI (published)

doi.org

openreview.net

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

Pierluca D'Oro

Max Schwarzer

Evgenii Nikishin

Marc Gendron-Bellemare

Aaron Courville

Increasing the replay ratio, the number of updates of an agent's parameters per environment interaction, is an appealing strategy for improv… (see more)ing the sample efficiency of deep reinforcement learning algorithms. In this work, we show that fully or partially resetting the parameters of deep reinforcement learning agents causes better replay ratio scaling capabilities to emerge. We push the limits of the sample efficiency of carefully-modified algorithms by training them using an order of magnitude more updates than usual, significantly improving their performance in the Atari 100k and DeepMind Control Suite benchmarks. We then provide an analysis of the design choices required for favorable replay ratio scaling to be possible and discuss inherent limits and tradeoffs.

2023-02-01

ICLR.cc/2023/Conference (notable)

openreview.net

Block-State Transformers

Mahan Fathi

Jonathan Pilault

Chris Pal

Orhan Firat

Ross Goroshin

2023-01-01

NeurIPS (published)

arxiv.org

Options of Interest: Temporal Abstraction with Interest Functions

Khimya Khetarpal

Martin Klissarov

Maxime Chevalier-Boisvert