Aaron Courville

Razvan Ciuca

Maîtrise recherche - Université de Montréal

Alexandre Diz Ganito

Maîtrise recherche - UdeM

Juan Duque

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Uday Kapur

Maîtrise professionnelle - UdeM

Amr Khalifa

Doctorat - UdeM

Samuel Lavoie

Doctorat - UdeM

Zhixuan Lin

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

andrei.nicolicioiu@gmail.com

Andrei Nicolicioiu

Doctorat - UdeM

Site web

Google Scholar

Evgenii Nikishin

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Johan Samir Obando Ceron

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

pichedereck@gmail.com

Site web

Esra'a Saleh

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e :

Anna (Cheng-Zhi) Huang

Shawn Tan

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

(Rex) Devon Hjelm

Google Scholar

Yusong Wu

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Anna (Cheng-Zhi) Huang

Dinghuai Zhang

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Hattie Zhou

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Hugo Larochelle

Publications

Versatile Energy-Based Probabilistic Models for High Energy Physics

Taoli Cheng

Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM

Kevin M Roccapriore

Max Schwarzer

Joshua Greaves

Jesse Farebrother

Colton Bishop

Maxim Ziatdinov

Igor Mordatch

Ekin Dogus Cubuk

Pablo Samuel Castro

Marc Gendron-Bellemare

Sergei V Kalinin

2023-07-22

Microscopy and Microanalysis (publié)

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Tim Cooijmans

Milad Aghajohari

2023-07-17

ArXiv (preprint)

Bigger, Better, Faster: Human-level Atari with human-level efficiency

Max Schwarzer

Johan Samir Obando Ceron

Marc Gendron-Bellemare

Pablo Samuel Castro

We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (voir plus)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (publié)

Learning with Learning Awareness using Meta-Values

Tim Cooijmans

Milad Aghajohari

2023-06-19

ICML.cc/2023/Workshop/Frontiers4LCD (publié)

Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets

Dinghuai Zhang

Hanjun Dai

Nikolay Malkin

Yoshua Bengio

Ling Pan

Combinatorial optimization (CO) problems are often NP-hard and thus out of reach for exact algorithms, making them a tempting domain to appl… (voir plus)y machine learning methods. The highly structured constraints in these problems can hinder either optimization or sampling directly in the solution space. On the other hand, GFlowNets have recently emerged as a powerful machinery to efficiently sample from composite unnormalized densities sequentially and have the potential to amortize such solution-searching processes in CO, as well as generate diverse solution candidates. In this paper, we design Markov decision processes (MDPs) for different combinatorial problems and propose to train conditional GFlowNets to sample from the solution space. Efficient training techniques are also developed to benefit long-range credit assignment. Through extensive experiments on a variety of different CO tasks with synthetic and realistic data, we demonstrate that GFlowNet policies can efficiently find high-quality solutions. Our implementation is open-sourced at https://github.com/zdhNarsil/GFlowNet-CombOpt.

2023-05-26

ArXiv (prépublication)

arxiv.org

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Sai Rajeswar

Pietro Mazzaglia

Tim Verbelen

Alexandre Piché

Bart Dhoedt

Alexandre Lacoste

Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but require la… (voir plus)rge amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, as shown in the Unsupervised RL Benchmark (URLB; Laskin et al. 2021), whether current unsupervised strategies can improve generalization capabilities is still unclear, especially in visual control settings. In this work, we study the URLB and propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent, and a task-aware fine-tuning strategy combined with a new proposed hybrid planner, Dyna-MPC, to adapt the agent for downstream tasks. On URLB, our method obtains 93.59% overall normalized performance, surpassing previous baselines by a staggering margin. The approach is empirically evaluated through a large-scale empirical study, which we use to validate our design choices and analyze our models. We also show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation. Project website: https://masteringurlb.github.io/

2023-04-24

ICML.cc/2023/Conference (poster)

proceedings.mlr.press

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Sai Rajeswar

Pietro Mazzaglia

Tim Verbelen

Alexandre Piché

Bart Dhoedt

Alexandre Lacoste

2023-04-24

ICML.cc/2023/Conference (publié)

Distributional GFlowNets with Quantile Flows

Dinghuai Zhang

Ling Pan

Ricky T. Q. Chen

Yoshua Bengio

Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating com… (voir plus)plex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training. By parameterizing each edge flow through their quantile functions, our proposed \textit{quantile matching} GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty. Moreover, we find that the distributional approach can achieve substantial improvement on existing benchmarks compared to prior methods due to our enhanced training algorithm, even in settings with deterministic rewards.

2023-02-11

ArXiv (prépublication)

arxiv.org

Generative Augmented Flow Networks

Ling Pan

Dinghuai Zhang

Longbo Huang

Yoshua Bengio

The Generative Flow Network is a probabilistic framework where an agent learns a stochastic policy for object generation, such that the prob… (voir plus)ability of generating an object is proportional to a given reward function. Its effectiveness has been shown in discovering high-quality and diverse solutions, compared to reward-maximizing reinforcement learning-based methods. Nonetheless, GFlowNets only learn from rewards of the terminal states, which can limit its applicability. Indeed, intermediate rewards play a critical role in learning, for example from intrinsic motivation to provide intermediate feedback even in particularly challenging sparse reward tasks. Inspired by this, we propose Generative Augmented Flow Networks (GAFlowNets), a novel learning framework to incorporate intermediate rewards into GFlowNets. We specify intermediate rewards by intrinsic motivation to tackle the exploration problem in sparse reward environments. GAFlowNets can leverage edge-based and state-based intrinsic rewards in a joint way to improve exploration. Based on extensive experiments on the GridWorld task, we demonstrate the effectiveness and efficiency of GAFlowNet in terms of convergence, performance, and diversity of solutions. We further show that GAFlowNet is scalable to a more complex and large-scale molecule generation domain, where it achieves consistent and significant performance improvement.

2023-02-01

ICLR.cc/2023/Conference (notable)

Investigating Multi-task Pretraining and Generalization in Reinforcement Learning

Adrien Ali Taiga

Jesse Farebrother