Aaron Courville

Reza Bayat

PhD - Université de Montréal

Co-supervisor :

Pascal Vincent

Anirudh Buvanesh

PhD - Université de Montréal

Principal supervisor :

Laurent Charlin

Abhranil Chandra

Collaborating researcher - University of Waterloo

Master's Research - Université de Montréal

Juan Duque

PhD - Université de Montréal

PhD - Université de Montréal

Arian Hosseini

PhD - Université de Montréal

Amr Khalifa

PhD - Université de Montréal

Samuel Lavoie

PhD - Université de Montréal

Zhixuan Lin

PhD - Université de Montréal

Ahmed Masry

Collaborating researcher - N/A

Alan Milligan

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Rishabh Agarwal

Andrei Nicolicioiu

PhD - Université de Montréal

Evgenii Nikishin

Collaborating Alumni - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Michell Mercedes Payano Perez

Johan Samir Obando Ceron

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher - Université de Montréal

Dereck Piché

Master's Research - Université de Montréal

Khaled Rouissi

Master's Research - Université de Montréal

Esra'a Saleh

PhD - Université de Montréal

Principal supervisor :

Glen Berseth

Vedant Shah

PhD - Université de Montréal

PhD - Université de Montréal

Yusong Wu

PhD - Université de Montréal

Principal supervisor :

Anna (Cheng-Zhi) Huang

sujin yun

PhD - Université de Montréal

Xiaofeng Zhang

PhD - Université de Montréal

Dinghuai Zhang

PhD - Université de Montréal

Co-supervisor :

Yoshua Bengio

Hattie Zhou

PhD - Université de Montréal

Principal supervisor :

Hugo Larochelle

Publications

Online black-box adaptation to label-shift in the presence of conditional-shift

Faruk Ahmed

We consider an out-of-distribution setting where trained predictive models are deployed online in new locations (inducing conditional-shift)… (see more), such that these locations are also associated with differently skewed target distributions (label-shift). While approaches for online adaptation to label-shift have recently been discussed by Wu et al. (2021), the potential presence of concurrent conditional-shift has not been considered in the literature, although one might anticipate such distributional shifts in realistic deployments. In this paper, we empirically explore the effectiveness of online adaptation methods in such situations on three synthetic and two realistic datasets, comprising both classification and regression problems. We show that it is possible to improve performance in these settings by learning additional hyper-parameters to account for the presence of conditional-shift by using appropriate validation sets.

2023-02-01

ICLR.cc/2023/Conference (rejected)

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

Marc Gendron-Bellemare

Increasing the replay ratio, the number of updates of an agent's parameters per environment interaction, is an appealing strategy for improv… (see more)ing the sample efficiency of deep reinforcement learning algorithms. In this work, we show that fully or partially resetting the parameters of deep reinforcement learning agents causes better replay ratio scaling capabilities to emerge. We push the limits of the sample efficiency of carefully-modified algorithms by training them using an order of magnitude more updates than usual, significantly improving their performance in the Atari 100k and DeepMind Control Suite benchmarks. We then provide an analysis of the design choices required for favorable replay ratio scaling to be possible and discuss inherent limits and tradeoffs.

2023-02-01

ICLR.cc/2023/Conference (notable)

Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

Samuel Lavoie

Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into …

2023-02-01

ICLR.cc/2023/Conference (notable)

Bigger, Better, Faster: Human-level Atari with human-level efficiency

Max Schwarzer

Johan Samir Obando Ceron

Marc Gendron-Bellemare

Rishabh Agarwal

Pablo Samuel Castro

We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (see more)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.

2023-01-01

ICML (published)

Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Yuchen Lu

Romain Laroche

2023-01-01

Trans. Mach. Learn. Res. (published)

Noisy Pairing and Partial Supervision for Stylized Opinion Summarization

Reinald Kim

Mirella Lapata. 2020

Un-611

Maxinder S. Kan-620

Somnath Basu

Roy Chowdhury

Chao Zhao

Tanya Goyal

Junyi Jiacheng Xu

Jessy Li

Ivor W. Tsang

James T. Kwok

Neil Houlsby

Andrei Giurgiu

Stanisław Jastrzębski … (see 22 more)

Bruna Morrone

Quentin de Laroussilhe

Mona Gesmundo

Attariyan Sylvain

Gelly

Thomas Wolf

Lysandre Debut

Julien Victor Sanh

Clement Chaumond

Anthony Delangue

Pier-339 Moi

Tim ric Cistac

R´emi Rault

Morgan Louf

Funtow-900 Joe

Sam Davison

Patrick Shleifer

Von Platen

Clara Ma

Yacine Jernite

Julien Plu

Canwen Xu

Opinion summarization research has primar-001 ily focused on generating summaries reflect-002 ing important opinions from customer reviews 0… (see more)03 without paying much attention to the writing 004 style. In this paper, we propose the stylized 005 opinion summarization task, which aims to 006 generate a summary of customer reviews in 007 the desired (e.g., professional) writing style. 008 To tackle the difficulty in collecting customer 009 and professional review pairs, we develop a 010 non-parallel training framework, Noisy Pair-011 ing and Partial Supervision ( NAPA ), which 012 trains a stylized opinion summarization sys-013 tem from non-parallel customer and profes-014 sional review sets. We create a benchmark P RO - 015 S UM by collecting customer and professional 016 reviews from Yelp and Michelin. Experimental 017 results on P RO S UM and FewSum demonstrate 018 that our non-parallel training framework con-019 sistently improves both automatic and human 020 evaluations, successfully building a stylized 021 opinion summarization model that can gener-022 ate professionally-written summaries from cus-023 tomer reviews. 024

2023-01-01

(published)

www.semanticscholar.org

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

Yuchen Lu

Romain Laroche

2023-01-01

Trans. Mach. Learn. Res. (published)

Versatile Energy-Based Models for High Energy Physics

Taoli Cheng

2023-01-01

arXiv.org (preprint)

Teaching Algorithmic Reasoning via In-context Learning

Hattie Zhou

Azade Nova

Hugo Larochelle

Behnam Neyshabur

Hanie Sedghi

2022-11-15

ArXiv (preprint)

On the Compositional Generalization Gap of In-Context Learning

Pretrained large generative language models have shown great performance on many tasks, but exhibit low compositional generalization abiliti… (see more)es. Scaling such models has been shown to improve their performance on various NLP tasks even just by conditioning them on a few examples to solve the task without any fine-tuning (also known as in-context learning). In this work, we look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning. In the ID settings, the demonstrations are from the same split (\textit{test} or \textit{train}) that the model is being evaluated on, and in the OOD settings, they are from the other split. We look at how the relative generalization gap of in-context learning evolves as models are scaled up. We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets, CFQ, SCAN and GeoQuery with different number of exemplars, and observe a trend of decreasing relative generalization gap as models are scaled up.

2022-11-15

ArXiv (preprint)

arxiv.org

Invariant representation driven neural classifier for anti-QCD jet tagging

Taoli Cheng

2022-10-24

Journal of High Energy Physics (published)

arxiv.org

Latent State Marginalization as a Low-cost Approach for Improving Exploration

Qinqing Zheng

Ricky T. Q. Chen

While the maximum entropy (MaxEnt) reinforcement learning (RL) framework -- often touted for its exploration and robustness capabilities -- … (see more)is usually motivated from a probabilistic perspective, the use of deep probabilistic models has not gained much traction in practice due to their inherent complexity. In this work, we propose the adoption of latent variable policies within the MaxEnt framework, which we show can provably approximate any policy distribution, and additionally, naturally emerges under the use of world models with a latent belief state. We discuss why latent variable policies are difficult to train, how naive approaches can fail, then subsequently introduce a series of improvements centered around low-cost marginalization of the latent state, allowing us to make full use of the latent state at minimal additional cost. We instantiate our method under the actor-critic framework, marginalizing both the actor and critic. The resulting algorithm, referred to as Stochastic Marginal Actor-Critic (SMAC), is simple yet effective. We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training. Our implementation is open sourced at https://github.com/zdhNarsil/Stochastic-Marginal-Actor-Critic.

2022-10-03

ArXiv (preprint)