Doina Precup

Kushal Arora

PhD - McGill University

Principal supervisor :

Sumana Basu

PhD - McGill University

Co-supervisor :

Adriana Romero Soriano

PhD - McGill University

Lynn Cherif

Master's Research - McGill University

Raymond Chua

PhD - McGill University

Co-supervisor :

Blake Richards

Élodie Côté-Gauthier Coté-Gauthier

Wesley Chung

PhD - McGill University

Principal supervisor :

David Meger

Jonathan Colaço Carr

Master's Research - McGill University

Principal supervisor :

Prakash Panangaden

Research Intern - McGill University

Nathan de Lara

Research Intern - McGill University

Manuel Del Verme

PhD - McGill University

Jesse Farebrother

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

PhD - McGill University

Co-supervisor :

David Meger

Will Hua

Master's Research - McGill University

Co-supervisor :

Howard Huang

PhD - McGill University

Haque Ishfaq

PhD - McGill University

PhD - McGill University

Mohammad Sami Nur Islam Islam

Research Intern - McGill University

Arushi Jain

PhD - McGill University

Master's Research - Université de Montréal

Principal supervisor :

PhD - McGill University

Postdoctorate - McGill University

Elaine Lau

Master's Research - McGill University

Jonathan Lebensold

PhD - McGill University

Sitao Luan

PhD - McGill University

Ray Luo

PhD - McGill University

Principal supervisor :

G McCracken

PhD - McGill University

Jaume Minano Masip

PhD - McGill University

Shahrad Mohammadzadeh

Master's Research - McGill University

Principal supervisor :

Gabriela Moisescu-Pareja

Master's Research - McGill University

Girdhar Neil Girdhar

Collaborating researcher - McGill University

Padideh Nouri

Master's Research - Université de Montréal

Charles Onu

PhD - McGill University

PhD - McGill University

Co-supervisor :

Nate Rahn

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

Janarthanan Rajendran

Postdoctorate - Université de Montréal

Principal supervisor :

Sarath Chandar

Sahand Rezaei-Shoshtari

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

Blake Richards

samiemandana@gmail.com

Shishir Sharma

PhD - McGill University

Master's Research - McGill University

Nishanth Anand Vemgal

PhD - McGill University

PhD - McGill University

Priyesh Vijayan

PhD - McGill University

Co-supervisor :

Samira Ebrahimi Kahou

Research Intern - McGill University

Keyu Wang

Research Intern - McGill University

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Steve Wen

Undergraduate - McGill University

Shuyuan Zhang

PhD - McGill University

Harry Zhao

PhD - McGill University

Co-supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Read the article

Publications

Mitigating Downstream Model Risks via Model Provenance

Keyu Wang

Abdullah Norozi Iranzad

Scott Schaffter

Jonathan Lebensold

Research and industry are rapidly advancing the innovation and adoption of foundation model-based systems, yet the tools for managing these … (see more)models have not kept pace. Understanding the provenance and lineage of models is critical for researchers, industry, regulators, and public trust. While model cards and system cards were designed to provide transparency, they fall short in key areas: tracing model genealogy, enabling machine readability, offering reliable centralized management systems, and fostering consistent creation incentives. This challenge mirrors issues in software supply chain security, but AI/ML remains at an earlier stage of maturity. Addressing these gaps requires industry-standard tooling that can be adopted by foundation model publishers, open-source model innovators, and major distribution platforms. We propose a machine-readable model specification format to simplify the creation of model records, thereby reducing error-prone human effort, notably when a new model inherits most of its design from a foundation model. Our solution explicitly traces relationships between upstream and downstream models, enhancing transparency and traceability across the model lifecycle. To facilitate the adoption, we introduce the unified model record (UMR) repository , a semantically versioned system that automates the publication of model records to multiple formats (PDF, HTML, LaTeX) and provides a hosted web interface (https://modelrecord.com/). This proof of concept aims to set a new standard for managing foundation models, bridging the gap between innovation and responsible model management.

2024-10-03

ArXiv (preprint)

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

Chenqing Hua

Yong Liu

Dinghuai Zhang

Odin Zhang

Sitao Luan

Kevin K. Yang

Shuangjia Zheng

2024-10-01

ArXiv (preprint)

Training Language Models to Self-Correct via Reinforcement Learning

Aviral Kumar

Vincent Zhuang

Rishabh Agarwal

Yi Su

John D Co-Reyes

Avi Singh

Kate Baumli

Shariq N Iqbal

Colton Bishop

Rebecca Roelofs

Lei M Zhang

Kay McKinney

Disha Shrivastava

Cosmin Paduraru

George Tucker

Feryal Behbahani

Aleksandra Faust

Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffecti… (see more)ve in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision. To this end, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data. To build SCoRe, we first show that variants of supervised fine-tuning (SFT) on offline model-generated correction traces are insufficient for instilling self-correction behavior. In particular, we observe that training via SFT either suffers from a distribution mismatch between the training data and the model's own responses or implicitly prefers only a certain mode of correction behavior that is often not effective at test time. SCoRe addresses these challenges by training under the model's own distribution of self-generated correction traces and using appropriate regularization to steer the learning process into learning a self-correction strategy that is effective at test time as opposed to simply fitting high-reward responses for a given prompt. This regularization prescribes running a first phase of RL on a base model to generate a policy initialization that is less susceptible to collapse and then using a reward bonus to amplify self-correction during training. When applied to Gemini 1.0 Pro and 1.5 Flash models, we find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on the MATH and HumanEval benchmarks.

2024-09-19

ArXiv (preprint)

An Attentive Approach for Building Partial Reasoning Agents from Pixels

Safa Alver

We study the problem of building reasoning agents that are able to generalize in an effective manner. Towards this goal, we propose an end-t… (see more)o-end approach for building model-based reinforcement learning agents that dynamically focus their reasoning to the relevant aspects of the environment: after automatically identifying the distinct aspects of the environment, these agents dynamically filter out the relevant ones and then pass them to their simulator to perform partial reasoning. Unlike existing approaches, our approach works with pixel-based inputs and it allows for interpreting the focal points of the agent. Our quantitative analyses show that the proposed approach allows for effective generalization in high-dimensional domains with raw observational inputs. We also perform ablation analyses to validate our design choices. Finally, we demonstrate through qualitative analyses that our approach actually allows for building agents that focus their reasoning on the relevant aspects of the environment.

2024-09-17

TMLR (accepted)

An Attentive Approach for Building Partial Reasoning Agents from Pixels

Safa Alver

We study the problem of building reasoning agents that are able to generalize in an effective manner. Towards this goal, we propose an end-t… (see more)o-end approach for building model-based reinforcement learning agents that dynamically focus their reasoning to the relevant aspects of the environment: after automatically identifying the distinct aspects of the environment, these agents dynamically filter out the relevant ones and then pass them to their simulator to perform partial reasoning. Unlike existing approaches, our approach works with pixel-based inputs and it allows for interpreting the focal points of the agent. Our quantitative analyses show that the proposed approach allows for effective generalization in high-dimensional domains with raw observational inputs. We also perform ablation analyses to validate of design choices. Finally, we demonstrate through qualitative analyses that our approach actually allows for building agents that focus their reasoning on the relevant aspects of the environment.

2024-09-17

TMLR (accepted)

Reactzyme: A Benchmark for Enzyme-Reaction Prediction

Chenqing Hua

Bozitao Zhong

Sitao Luan

Liang Hong

Shuangjia Zheng

2024-08-24

ArXiv (preprint)

Adaptive Exploration for Data-Efficient General Value Function Evaluations

Arushi Jain

Josiah P. Hanna

General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each… (see more) GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This leaves an open question: how can behavior policy be chosen for data-efficient GVF learning? To address this gap, we propose GVFExplorer, which aims at learning a behavior policy that efficiently gathers data for evaluating multiple GVFs in parallel. This behavior policy selects actions in proportion to the total variance in the return across all GVFs, reducing the number of environmental interactions. To enable accurate variance estimation, we use a recently proposed temporal-difference-style variance estimator. We prove that each behavior policy update reduces the mean squared error in the summed predictions over all GVFs. We empirically demonstrate our method's performance in both tabular representations and nonlinear function approximation.

2024-08-01

EWRL/2024/Workshop (accepted)

The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

Sitao Luan

Chenqing Hua

Qincheng Lu

Liheng Ma

Lirong Wu

Xinyu Wang

Minkai Xu

Xiao-Wen Chang

Rex Ying

Stan Z. Li

Jian Tang

Stefanie Jegelka

Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to b… (see more)e the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.

2024-07-12

ArXiv (preprint)

Functional Acceleration for Policy Mirror Descent

Veronica Chelu

2024-06-19

ICML.cc/2024/Workshop/ARLET (poster)

Functional Acceleration for Policy Mirror Descent

Veronica Chelu

We apply functional acceleration to the Policy Mirror Descent (PMD) general family of algorithms, which cover a wide range of novel and fund… (see more)amental methods in Reinforcement Learning (RL). Leveraging duality, we propose a momentum-based PMD update. By taking the functional route, our approach is independent of the policy parametrization and applicable to large-scale optimization, covering previous applications of momentum at the level of policy parameters as a special case. We theoretically analyze several properties of this approach and complement with a numerical ablation study, which serves to illustrate the policy optimization dynamics on the value polytope, relative to different algorithmic design choices in this space. We further characterize numerically several features of the problem setting relevant for functional acceleration, and lastly, we investigate the impact of approximation on their learning mechanics.

2024-06-19

ICML.cc/2024/Workshop/ARLET (poster)

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

Haque Ishfaq

Yixin Tan

Yu Yang

Qingfeng Lan

Jianfeng Lu

A. Rupam Mahmood

Pan Xu

2024-06-18

ArXiv (preprint)

QGFN: Controllable Greediness with Action Values

Elaine Lau

Stephen Zhewen Lu

Ling Pan

Emmanuel Bengio

Generative Flow Networks (GFlowNets; GFNs) are a family of reward/energy-based generative methods for combinatorial objects, capable of gene… (see more)rating diverse and high-utility samples. However, biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate,

2024-06-17

ICML.cc/2024/Workshop/SPIGM (poster)