Aaron Courville

Razvan Ciuca

Maîtrise recherche - Université de Montréal

Alexandre Diz Ganito

Maîtrise recherche - UdeM

Juan Duque

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Uday Kapur

Maîtrise professionnelle - UdeM

Amr Khalifa

Doctorat - UdeM

andrei.nicolicioiu@gmail.com

Samuel Lavoie

Doctorat - UdeM

Zhixuan Lin

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Rishabh Agarwal

Andrei Nicolicioiu

Doctorat - UdeM

Site web

Google Scholar

Evgenii Nikishin

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Johan Samir Obando Ceron

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

pichedereck@gmail.com

Site web

Esra'a Saleh

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e :

Anna (Cheng-Zhi) Huang

Shawn Tan

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

(Rex) Devon Hjelm

Google Scholar

Yusong Wu

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Anna (Cheng-Zhi) Huang

Dinghuai Zhang

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Hattie Zhou

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Hugo Larochelle

Publications

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

Ankit Vani

Bac Nguyen

Samuel Lavoie

Ranjay Krishna

Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception al… (voir plus)lows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose SPARO, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using SPARO with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual SPARO concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of SPARO's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts.

2024-01-01

ECCV (66) (publié)

arxiv.org

On the consistency of hyper-parameter selection in value-based deep reinforcement learning

Johan Samir Obando Ceron

João Guilherme Madeira Araújo

Pablo Samuel Castro

Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and car… (voir plus)eful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes.

2024-01-01

RLJ (publié)

Language Model Alignment with Elastic Reset

Michael Noukhovitch

Samuel Lavoie

Florian Strub

Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimiz… (voir plus)ing against a reward model can improve on reward while degrading performance in other areas, a phenomenon known as reward hacking, alignment tax, or language drift. First, we argue that commonly-used test metrics are insufficient and instead measure how different algorithms tradeoff between reward and drift. The standard method modified the reward with a Kullback-Lieber (KL) penalty between the online and initial model. We propose Elastic Reset, a new algorithm that achieves higher reward with less drift without explicitly modifying the training objective. We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model. Through the use of an EMA, our model recovers quickly after resets and achieves higher reward with less drift in the same number of steps. We demonstrate that fine-tuning language models with Elastic Reset leads to state-of-the-art performance on a small scale pivot-translation benchmark, outperforms all baselines in a medium-scale RLHF-like IMDB mock sentiment task and leads to a more performant and more aligned technical QA chatbot with LLaMA-7B. Code available at github.com/mnoukhov/elastic-reset.

2023-12-06

ArXiv (prépublication)

arxiv.org

Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Max Schwarzer

Jesse Farebrother

Joshua Greaves

Kevin Roccapriore

Ekin Dogus Cubuk

Rishabh Agarwal

Marc Gendron-Bellemare

Sergei Kalinin

Igor Mordatch

Pablo Samuel Castro

We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (voir plus)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Mat (spotlight)

Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Max Schwarzer

Jesse Farebrother

Joshua Greaves

Kevin Roccapriore

Ekin Dogus Cubuk

Rishabh Agarwal

Marc Gendron-Bellemare

Sergei Kalinin

Igor Mordatch

Pablo Samuel Castro

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Mat (spotlight)

Sparse Universal Transformer

Shawn Tan

Yikang Shen

Zhenfang Chen

Chuang Gan

The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers and is Turing-complete under certain… (voir plus) assumptions. Empirical evidence also shows that UTs have better compositional generalization than Vanilla Transformers (VTs) in formal language tasks. The parameter-sharing also affords it better parameter efficiency than VTs. Despite its many advantages, most state-of-the-art NLP systems use VTs as their backbone model instead of UTs. This is mainly because scaling UT parameters is more compute and memory intensive than scaling up a VT. This paper proposes the Sparse Universal Transformer (SUT), which leverages Sparse Mixture of Experts (SMoE) to reduce UT's computation complexity while retaining its parameter efficiency and generalization ability. Experiments show that SUT combines the best of both worlds, achieving strong generalization results on formal language tasks (Logical inference and CFQ) and impressive parameter and computation efficiency on standard natural language benchmarks like WMT'14.

2023-10-07

EMNLP/2023/Conference (accepté)

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

Dinghuai Zhang

Ricky T. Q. Chen

Cheng-Hao Liu

Yoshua Bengio

We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine lear… (voir plus)ning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajectories to compute, resulting in sluggish credit assignment issues due to use of entire trajectories and a learning signal present only at the terminal time. In this work, we present Diffusion Generative Flow Samplers (DGFS), a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments, via parameterizing an additional"flow function". Our method takes inspiration from the theory developed for generative flow networks (GFlowNets), allowing us to make use of intermediate learning signals. Through various challenging experiments, we demonstrate that DGFS achieves more accurate estimates of the normalization constant than closely-related prior methods.

2023-10-04

ArXiv (prépublication)

arxiv.org

Double Gumbel Q-Learning.

David Yu-Tung Hui

Pierre-Luc Bacon