Doina Precup

Sumana Basu

Doctorat - McGill

Co-superviseur⋅e :

Adriana Romero Soriano

Collaborateur·rice alumni - McGill

Lynn Cherif

Maîtrise recherche - McGill

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Superviseur⋅e principal⋅e :

David Meger

Jonathan Colaço Carr

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Prakash Panangaden

Élodie Coté-Gauthier

Collaborateur·rice de recherche - McGill

Co-superviseur⋅e :

Isabeau Prémont-Schwarz

Franco Del Balso

Stagiaire de recherche - UdeM

Jesse Farebrother

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Collaborateur·rice alumni - McGill

Mohammad Sami Nur Islam Islam

Maîtrise recherche - McGill

Arushi Jain

Collaborateur·rice alumni - McGill

Doctorat - Polytechnique

Flemming Kondrup

Postdoctorat - McGill

Elaine Lau

Maîtrise recherche - McGill

Jonathan Lebensold

Collaborateur·rice alumni - McGill

Baccalauréat - McGill

Ray Luo

Doctorat - McGill

Superviseur⋅e principal⋅e :

G McCracken

Doctorat - McGill

Nazanin Mohammadi Sepahvand

Collaborateur·rice alumni - McGill

Shahrad Mohammadzadeh

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Gabriela Moisescu-Pareja

Collaborateur·rice de recherche - McGill

Co-superviseur⋅e :

Irina Rish

Padideh Nouri

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Nate Rahn

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Sahand Rezaei-Shoshtari

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Nishanth Anand Vemgal

Doctorat - McGill

Doctorat - McGill

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Zihan Wang

Doctorat - McGill

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Guangyuan Wang

Stagiaire de recherche - McGill

Steve Wen

Maîtrise recherche - McGill

Co-superviseur⋅e :

Gregory Dudek

Zijing Wu

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Harry Zhao

Collaborateur·rice alumni - McGill

Co-superviseur⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Lire l'article

Publications

Soft Condorcet Optimization for Ranking of General Agents

Marc Lanctot

Kate Larson

Michael Kaisers

Quentin Berthet

Ian Gemp

Manfred Diaz

Roberto-Rafael Maura-Rivero

Yoram Bachrach

Anna Koop

A common way to drive progress of AI models and agents is to compare their performance on standardized benchmarks. Comparing the performance… (voir plus) of general agents requires aggregating their individual performances across a potentially wide variety of different tasks. In this paper, we describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO), to compute the optimal ranking of agents: the one that makes the fewest mistakes in predicting the agent comparisons in the evaluation data. This optimal ranking is the maximum likelihood estimate when evaluation data (which we view as votes) are interpreted as noisy samples from a ground truth ranking, a solution to Condorcet's original voting system criteria. SCO ratings are maximal for Condorcet winners when they exist, which we show is not necessarily true for the classical rating system Elo. We propose three optimization algorithms to compute SCO ratings and evaluate their empirical performance. When serving as an approximation to the Kemeny-Young voting method, SCO rankings are on average 0 to 0.043 away from the optimal ranking in normalized Kendall-tau distance across 865 preference profiles from the PrefLib open ranking archive. In a simulated noisy tournament setting, SCO achieves accurate approximations to the ground truth ranking and the best among several baselines when 59\% or more of the preference data is missing. Finally, SCO ranking provides the best approximation to the optimal ranking, measured on held-out test sets, in a problem containing 52,958 human players across 31,049 games of the classic seven-player game of Diplomacy.

2024-10-31

ArXiv (prépublication)

arxiv.org

Effective Protein-Protein Interaction Exploration with PPIretrieval

Chenqing Hua

Connor W. Coley

Guy Wolf

Shuangjia Zheng

2024-10-13

NeurIPS.cc/2024/Workshop/AIDrugX (poster)

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

Yang Liu

Odin Zhang

Kevin K Yang

Shuangjia Zheng

2024-10-13

NeurIPS.cc/2024/Workshop/AIDrugX (poster)

Identifying and Addressing Delusions for Target-Directed Decision-Making

Mingde Zhao

Romain Laroche

We are interested in target-directed agents, which produce targets during decision-time planning, to guide their behaviors and achieve bette… (voir plus)r generalization during evaluation. Improper training of these agents can result in delusions: the agent may come to hold false beliefs about the targets, which cannot be properly rejected, leading to unwanted behaviors and damaging out-of-distribution generalization. We identify different types of delusions by using intuitive examples in carefully controlled environments, and investigate their causes. We demonstrate how delusions can be addressed for agents trained by hindsight relabeling, a mainstream approach in for training target-directed RL agents. We validate empirically the effectiveness of the proposed solutions in correcting delusional behaviors and improving out-of-distribution generalization.

2024-10-12

NeurIPS.cc/2024/Workshop/SafeGenAi (poster)

Mitigating Downstream Model Risks via Model Provenance

Keyu Wang

Abdullah Norozi Iranzad

Scott Schaffter

Jonathan Lebensold

Meg Risdal

Research and industry are rapidly advancing the innovation and adoption of foundation model-based systems, yet the tools for managing these … (voir plus)models have not kept pace. Understanding the provenance and lineage of models is critical for researchers, industry, regulators, and public trust. While model cards and system cards were designed to provide transparency, they fall short in key areas: tracing model genealogy, enabling machine readability, offering reliable centralized management systems, and fostering consistent creation incentives. This challenge mirrors issues in software supply chain security, but AI/ML remains at an earlier stage of maturity. Addressing these gaps requires industry-standard tooling that can be adopted by foundation model publishers, open-source model innovators, and major distribution platforms. We propose a machine-readable model specification format to simplify the creation of model records, thereby reducing error-prone human effort, notably when a new model inherits most of its design from a foundation model. Our solution explicitly traces relationships between upstream and downstream models, enhancing transparency and traceability across the model lifecycle. To facilitate the adoption, we introduce the unified model record (UMR) repository , a semantically versioned system that automates the publication of model records to multiple formats (PDF, HTML, LaTeX) and provides a hosted web interface (https://modelrecord.com/). This proof of concept aims to set a new standard for managing foundation models, bridging the gap between innovation and responsible model management.

2024-10-09

NeurIPS.cc/2024/Workshop/SoLaR (poster)

Rejecting Hallucinated State Targets during Planning

Mingde Zhao

Tristan Sylvain

Romain Laroche

Yoshua Bengio

2024-10-09

ArXiv (prépublication)

arxiv.org

Reactzyme: A Benchmark for Enzyme-Reaction Prediction

Bozitao Zhong

Liang Hong

Shuangjia Zheng

2024-09-26

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

Adaptive Exploration for Data-Efficient General Value Function Evaluations

Arushi Jain

Josiah P. Hanna

General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each… (voir plus) GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This leaves an open question: how can behavior policy be chosen for data-efficient GVF learning? To address this gap, we propose GVFExplorer, which aims at learning a behavior policy that efficiently gathers data for evaluating multiple GVFs in parallel. This behavior policy selects actions in proportion to the total variance in the return across all GVFs, reducing the number of environmental interactions. To enable accurate variance estimation, we use a recently proposed temporal-difference-style variance estimator. We prove that each behavior policy update reduces the mean squared error in the summed predictions over all GVFs. We empirically demonstrate our method's performance in both tabular representations and nonlinear function approximation.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

Efficient Reinforcement Learning by Discovering Neural Pathways

Samin Yeasar Arnob

Riyasat Ohib

Sergey Plis

Amy Zhang

Alessandro Sordoni

Reinforcement learning (RL) algorithms have been very successful at tackling complex control problems, such as AlphaGo or fusion control. Ho… (voir plus)wever, current research mainly emphasizes solution quality, often achieved by using large models trained on large amounts of data, and does not account for the financial, environmental, and societal costs associated with developing and deploying such models. Modern neural networks are often overparameterized and a significant number of parameters can be pruned without meaningful loss in performance, resulting in more efficient use of the model's capacity lottery ticket. We present a methodology for identifying sub-networks within a larger network in reinforcement learning (RL). We call such sub-networks, neural pathways. We show empirically that even very small learned sub-networks, using less than 5% of the large network's parameters, can provide very good quality solutions. We also demonstrate the training of multiple pathways within the same networks in a multitask setup, where each pathway is encouraged to tackle a separate task. We evaluate empirically our approach on several continuous control tasks, in both online and offline training

2024-09-25

NeurIPS.cc/2024/Conference (poster)

Learning Successor Features the Simple Way

Christos Kaplanis

In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference … (voir plus)in non-stationary environments. Successor Features (SFs) offer a potential solution to this challenge. However, canonical techniques for learning SFs from pixel-level observations often lead to representation collapse, wherein representations degenerate and fail to capture meaningful variations in the data. More recent methods for learning SFs can avoid representation collapse, but they often involve complex losses and multiple learning phases, reducing their efficiency. We introduce a novel, simple method for learning SFs directly from pixels. Our approach uses a combination of a Temporal-difference (TD) loss and a reward prediction loss, which together capture the basic mathematical definition of SFs. We show that our approach matches or outperforms existing SF learning techniques in both 2D (Minigrid), 3D (Miniworld) mazes and Mujoco, for both single and continual learning scenarios. As well, our technique is efficient, and can reach higher levels of performance in less time than other approaches. Our work provides a new, streamlined technique for learning SFs directly from pixel observations, with no pretraining required.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

Offline Multitask Representation Learning for Reinforcement Learning

Haque Ishfaq

Thanh Nguyen-Tang

Songtao Feng

Raman Arora

Mengdi Wang

Ming Yin