Doina Precup

Sumana Basu

Collaborating Alumni - McGill University

Co-supervisor :

Adriana Romero Soriano

Collaborating Alumni - McGill University

Lynn Cherif

Collaborating Alumni - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Principal supervisor :

David Meger

Jonathan Colaço Carr

Master's Research - McGill University

Principal supervisor :

Prakash Panangaden

Élodie Coté-Gauthier

Collaborating researcher - McGill University

Co-supervisor :

Isabeau Prémont-Schwarz

Franco Del Balso

Collaborating researcher - Université de Montréal

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

PhD - McGill University

Principal supervisor :

Collaborating researcher - Birla Institute of Technology

Howard Huang

PhD - McGill University

Haque Ishfaq

Collaborating Alumni - McGill University

Mohammad Sami Nur Islam Islam

Master's Research - McGill University

Arushi Jain

Collaborating Alumni - McGill University

PhD - Polytechnique Montréal

Martin Klissarov

PhD - McGill University

Postdoctorate - McGill University

Jonathan Lebensold

Collaborating Alumni - McGill University

Collaborating Alumni - McGill University

Ray Luo

PhD - McGill University

Principal supervisor :

G McCracken

PhD - McGill University

Nazanin Mohammadi Sepahvand

Collaborating Alumni - McGill University

Shahrad Mohammadzadeh

Master's Research - McGill University

Principal supervisor :

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Co-supervisor :

Irina Rish

Padideh Nouri

PhD - Université de Montréal

Co-supervisor :

PhD - McGill University

Co-supervisor :

Nate Rahn

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

Sahand Rezaei-Shoshtari

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Nishanth Anand Vemgal

PhD - McGill University

PhD - McGill University

Co-supervisor :

Samira Ebrahimi Kahou

Zihan Wang

PhD - McGill University

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Guangyuan Wang

Research Intern - McGill University

Steve Wen

Master's Research - McGill University

Co-supervisor :

Gregory Dudek

Zijing Wu

PhD - McGill University

Principal supervisor :

PhD - McGill University

Harry Zhao

Collaborating Alumni - McGill University

Co-supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Read the article

Publications

Training Matters: Unlocking Potentials of Deeper Graph Convolutional Neural Networks

Sitao Luan

Mingde Zhao

Xiao-Wen Chang

2024-02-20

Complex Networks & Their Applications XII (published)

When Do We Need Graph Neural Networks for Node Classification?

Sitao Luan

Qincheng Lu

Jiaqi Zhu

Xiao-Wen Chang

2024-02-20

Complex Networks & Their Applications XII (published)

Discrete Probabilistic Inference as Control in Multi-path Environments

We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to … (see more)find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the marginal distribution induced by the optimal MaxEnt RL policy is proportional to the original reward, regardless of the structure of the underlying MDP. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward. Finally, we study empirically the performance of multiple MaxEnt RL and GFlowNet algorithms on multiple problems involving sampling from discrete distributions.

2024-02-15

ArXiv (preprint)

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Samir Obando Ceron

Ghada Sokar

Timon Willi

Clare Lyle

Gintare Karolina Dziugaite

Jakob Nicolaus Foerster

Pablo Samuel Castro

2024-02-13

ArXiv (preprint)

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Samir Obando Ceron

Ghada Sokar

Timon Willi

Clare Lyle

Gintare Karolina Dziugaite

Jakob Nicolaus Foerster

Pablo Samuel Castro

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (see more)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

2024-02-13

ArXiv (preprint)

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Samir Obando Ceron

Ghada Sokar

Timon Willi

Clare Lyle

Gintare Karolina Dziugaite

Jakob Nicolaus Foerster

Pablo Samuel Castro

2024-02-13

ArXiv (preprint)

On the Privacy of Selection Mechanisms with Gaussian Noise

Jonathan Lebensold

Borja Balle

2024-02-09

ArXiv (preprint)

Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto

Mohammad Sami Nur Islam

Martin Klissarov

Sherry Yang

Ankit Anand

Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and p… (see more)rovide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments, and can be more effective in training RL policies than the original sparse environment rewards.

2024-02-07

ArXiv (preprint)

Effective Protein-Protein Interaction Exploration with PPIretrieval

Connor W. Coley

Guy Wolf

Shuangjia Zheng

Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, an… (see more)d immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration, which leverages existing PPI data to effectively search for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces. When provided with an unseen query protein with its associated binding site, PPIretrieval effectively identifies a potential binding partner along with its corresponding binding site in an embedding space, facilitating the formation of protein-protein complexes.

2024-02-06

ArXiv (preprint)

Effective Protein-Protein Interaction Exploration with PPIretrieval

Connor W. Coley

Guy Wolf

Shuangjia Zheng

2024-02-06

ArXiv (preprint)

Effective Protein-Protein Interaction Exploration with PPIretrieval

Connor Coley

Guy Wolf

Shuangjia Zheng

2024-02-06

ArXiv (preprint)

Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

Harry Zhao 0001

Harry Zhao

Mingde Zhao

Safa Alver

Harm van Seijen

Romain Laroche