Aaron Courville

Reza Bayat

PhD - Université de Montréal

Co-supervisor :

Pascal Vincent

Anirudh Buvanesh

PhD - Université de Montréal

Principal supervisor :

Laurent Charlin

anirudb1102@gmail.com

Razvan Ciuca

Master's Research - Université de Montréal

Alexandre Diz Ganito

Master's Research - Université de Montréal

Juan Duque

PhD - Université de Montréal

PhD - Université de Montréal

Arian Hosseini

PhD - Université de Montréal

Uday Kapur

Professional Master's - Université de Montréal

Amr Khalifa

PhD - Université de Montréal

andrei.nicolicioiu@gmail.com

Samuel Lavoie

PhD - Université de Montréal

Zhixuan Lin

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Rishabh Agarwal

Andrei Nicolicioiu

PhD - Université de Montréal

Evgenii Nikishin

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Co-supervisor :

Johan Samir Obando Ceron

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

pichedereck@gmail.com

Esra'a Saleh

PhD - Université de Montréal

Principal supervisor :

Master's Research - Université de Montréal

Principal supervisor :

Anna (Cheng-Zhi) Huang

Shawn Tan

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

(Rex) Devon Hjelm

Yusong Wu

PhD - Université de Montréal

Principal supervisor :

Anna (Cheng-Zhi) Huang

Xiaofeng Zhang

PhD - Université de Montréal

Dinghuai Zhang

PhD - Université de Montréal

Co-supervisor :

Yoshua Bengio

Hattie Zhou

PhD - Université de Montréal

Principal supervisor :

Hugo Larochelle

Publications

Group Robust Classification Without Any Group Information

Christos Tsirigotis

Joao Monteiro

Pau Rodriguez

David Vazquez

Improving Compositional Generalization using Iterated Learning and Simplicial Embeddings

Yi Ren

Samuel Lavoie

Mikhail Galkin

Danica J. Sutherland

Language Model Alignment with Elastic Reset

Michael Noukhovitch

Samuel Lavoie

Florian Strub

Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimiz… (see more)ing against a reward model can improve on reward while degrading performance in other areas, a phenomenon known as reward hacking, alignment tax, or language drift. First, we argue that commonly-used test metrics are insufficient and instead measure how different algorithms tradeoff between reward and drift. The standard method modified the reward with a Kullback-Lieber (KL) penalty between the online and initial model. We propose Elastic Reset, a new algorithm that achieves higher reward with less drift without explicitly modifying the training objective. We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model. Through the use of an EMA, our model recovers quickly after resets and achieves higher reward with less drift in the same number of steps. We demonstrate that fine-tuning language models with Elastic Reset leads to state-of-the-art performance on a small scale pivot-translation benchmark, outperforms all baselines in a medium-scale RLHF-like IMDB mock sentiment task and leads to a more performant and more aligned technical QA chatbot with LLaMA-7B. Code available at github.com/mnoukhov/elastic-reset.

Let the Flows Tell: Solving Graph Combinatorial Problems with GFlowNets

Dinghuai Zhang

Hanjun Dai

Nikolay Malkin

Yoshua Bengio

Ling Pan

Versatile Energy-Based Probabilistic Models for High Energy Physics

Taoli Cheng

Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM

Kevin M Roccapriore

Max Schwarzer

Joshua Greaves

Jesse Farebrother

Rishabh Agarwal

Colton Bishop

Maxim Ziatdinov

Igor Mordatch

Ekin Dogus Cubuk

Pablo Samuel Castro

Marc Gendron-Bellemare

Sergei V Kalinin

2023-07-22

Microscopy and Microanalysis (published)

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Tim Cooijmans

Milad Aghajohari

2023-07-17

ArXiv (preprint)

Bigger, Better, Faster: Human-level Atari with human-level efficiency

Max Schwarzer

Johan Samir Obando Ceron

Marc Gendron-Bellemare

Rishabh Agarwal

Pablo Samuel Castro

We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (see more)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

Learning with Learning Awareness using Meta-Values

Tim Cooijmans

Milad Aghajohari

2023-06-19

ICML.cc/2023/Workshop/Frontiers4LCD (published)

Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets

Dinghuai Zhang

Hanjun Dai

Nikolay Malkin

Yoshua Bengio

Ling Pan

Combinatorial optimization (CO) problems are often NP-hard and thus out of reach for exact algorithms, making them a tempting domain to appl… (see more)y machine learning methods. The highly structured constraints in these problems can hinder either optimization or sampling directly in the solution space. On the other hand, GFlowNets have recently emerged as a powerful machinery to efficiently sample from composite unnormalized densities sequentially and have the potential to amortize such solution-searching processes in CO, as well as generate diverse solution candidates. In this paper, we design Markov decision processes (MDPs) for different combinatorial problems and propose to train conditional GFlowNets to sample from the solution space. Efficient training techniques are also developed to benefit long-range credit assignment. Through extensive experiments on a variety of different CO tasks with synthetic and realistic data, we demonstrate that GFlowNet policies can efficiently find high-quality solutions. Our implementation is open-sourced at https://github.com/zdhNarsil/GFlowNet-CombOpt.

2023-05-26

ArXiv (preprint)

arxiv.org

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Sai Rajeswar

Pietro Mazzaglia

Tim Verbelen

Alexandre Piché

Bart Dhoedt

Alexandre Lacoste

2023-04-24

ICML.cc/2023/Conference (published)

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Sai Rajeswar

Pietro Mazzaglia

Tim Verbelen

Alexandre Piché

Bart Dhoedt

Alexandre Lacoste

Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but require la… (see more)rge amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, as shown in the Unsupervised RL Benchmark (URLB; Laskin et al. 2021), whether current unsupervised strategies can improve generalization capabilities is still unclear, especially in visual control settings. In this work, we study the URLB and propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent, and a task-aware fine-tuning strategy combined with a new proposed hybrid planner, Dyna-MPC, to adapt the agent for downstream tasks. On URLB, our method obtains 93.59% overall normalized performance, surpassing previous baselines by a staggering margin. The approach is empirically evaluated through a large-scale empirical study, which we use to validate our design choices and analyze our models. We also show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation. Project website: https://masteringurlb.github.io/

2023-04-24

ICML.cc/2023/Conference (poster)

proceedings.mlr.press