Publications

Capacity Expansion in the College Admission Problem

Federico Bobbio

Margarida Carvalho

Andrea Lodi

Alfredo Torrico

2020-12-31

arXiv.org (preprint)

dblp.uni-trier.de

ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models

Salva Rühling Cachay

Venkatesh Ramesh

Jason N. S. Cole

Howard Barker

David Rolnick

Numerical simulations of Earth's weather and climate require substantial amounts of computation. This has led to a growing interest in repla… (see more)cing subroutines that explicitly compute physical processes with approximate machine learning (ML) methods that are fast at inference time. Within weather and climate models, atmospheric radiative transfer (RT) calculations are especially expensive. This has made them a popular target for neural network-based emulators. However, prior work is hard to compare due to the lack of a comprehensive dataset and standardized best practices for ML benchmarking. To fill this gap, we build a large dataset, ClimART, with more than 10 million samples from present, pre-industrial, and future climate conditions, based on the Canadian Earth System Model. ClimART poses several methodological challenges for the ML community, such as multiple out-of-distribution test sets, underlying domain physics, and a trade-off between accuracy and inference speed. We also present several novel baselines that indicate shortcomings of datasets and network architectures used in prior work.

2020-12-31

NeurIPS Datasets and Benchmarks (published)

openreview.net

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

Jonathan Pilault

Amine El hattami

Christopher Pal

Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MT… (see more)L must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer based Hypernetwork Adapter consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction, we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets.

2020-12-31

ICLR (published)

openreview.net

A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

Mingde Zhao

We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during plan… (see more)ning. The agent uses a bottleneck mechanism over a set-based representation to force the number of entities to which the agent attends at each planning step to be small. In experiments, we investigate the bottleneck mechanism with several sets of customized environments featuring different challenges. We consistently observe that the design allows the planning agents to generalize their learned task-solving abilities in compatible unseen environments by attending to the relevant objects, leading to better out-of-distribution generalization performance.

2020-12-31

Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (published)

doi.org

openreview.net

Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Switched Linear Systems

Borna Sayedana

Mohammad Afshari

Peter E. Caines

Aditya Mahajan

In this paper, we investigate the problem of system identiﬁcation for autonomous switched linear systems with complete state observations.… (see more) We propose switched least squares method for the identiﬁcation for switched linear systems, show that this method is strongly consistent, and derive data-dependent and data-independent rates of convergence. In particular, our data-dependent rate of convergence shows that, almost surely, the system identiﬁcation error is O (cid:0)(cid:112) log( T ) /T (cid:1) where T is the time horizon. These results show that our method for switched linear systems has the same rate of convergence as least squares method for non-switched linear systems. We compare our results with those in the literature. We present numerical examples to illustrate the performance of the proposed system identiﬁcation method.

2020-12-31

arXiv.org (preprint)

dblp.uni-trier.de

Continual Learning via Local Module Composition

Oleksiy Ostapenko

Pau Rodríguez

Massimo Caccia

Laurent Charlin

Modularity is a compelling solution to continual learning (CL), the problem of modeling sequences of related tasks. Learning and then compos… (see more)ing modules to solve different tasks provides an abstraction to address the principal challenges of CL including catastrophic forgetting, backward and forward transfer across tasks, and sub-linear model growth. We introduce local module composition (LMC), an approach to modular CL where each module is provided a local structural component that estimates a module's relevance to the input. Dynamic module composition is performed layer-wise based on local relevance scores. We demonstrate that agnosticity to task identities (IDs) arises from (local) structural learning that is module-specific as opposed to the task- and/or model-specific as in previous works, making LMC applicable to more CL settings compared to previous works. In addition, LMC also tracks statistics about the input distribution and adds new modules when outlier samples are detected. In the first set of experiments, LMC performs favorably compared to existing methods on the recent Continual Transfer-learning Benchmark without requiring task identities. In another study, we show that the locality of structural learning allows LMC to interpolate to related but unseen tasks (OOD), as well as to compose modular networks trained independently on different task sequences into a third modular network without any fine-tuning. Finally, in search for limitations of LMC we study it on more challenging sequences of 30 and 100 tasks, demonstrating that local module selection becomes much more challenging in presence of a large number of candidate modules. In this setting best performing LMC spawns much fewer modules compared to an oracle based baseline, however, it reaches a lower overall accuracy. The codebase is available under https://github.com/oleksost/LMC.

2020-12-31

Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (published)

doi.org

openreview.net

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Rishabh Agarwal

Marlos C. Machado

Pablo Samuel Castro

Bellemare Marc-Emmanuel

Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generali… (see more)zation, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. PSM assigns high similarity to states for which the optimal policies in those states as well as in future states are similar. We also present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings (PSEs). We demonstrate that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.

2020-12-31

ICLR (published)

openreview.net

Convex Potential Flows: Universal Probability Distributions with Optimal Transport and Convex Optimization

Chin-wei Huang

Ricky T. Q. Chen

Christos Tsirigotis

Aaron Courville

Flow-based models are powerful tools for designing probabilistic models with tractable density. This paper introduces Convex Potential Flows… (see more) (CP-Flow), a natural and efficient parameterization of invertible models inspired by the optimal transport (OT) theory. CP-Flows are the gradient map of a strongly convex neural potential function. The convexity implies invertibility and allows us to resort to convex optimization to solve the convex conjugate for efficient inversion. To enable maximum likelihood training, we derive a new gradient estimator of the log-determinant of the Jacobian, which involves solving an inverse-Hessian vector product using the conjugate gradient method. The gradient estimator has constant-memory cost, and can be made effectively unbiased by reducing the error tolerance level of the convex optimization routine. Theoretically, we prove that CP-Flows are universal density approximators and are optimal in the OT sense. Our empirical results show that CP-Flow performs competitively on standard benchmarks of density estimation and variational inference.

2020-12-31

ICLR (published)

doi.org

openreview.net

Cooperative Semi-Supervised Transfer Learning of Machine Reading Comprehension

Oliver Bender

Franz Josef Och

Yoshua Bengio

R´ejean Ducharme

P Vincent

Kevin Clark

Quoc Minh-Thang Luong

V. Le

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Adam Fisch

Alon Talmor

Robin Jia

Minjoon Seo

Michael R. Glass

A. Gliozzo

Rishav Chakravarti

Ian J Goodfellow

Jean Pouget-Abadie … (see 39 more)

Mehdi Mirza

Serhii Havrylov

Ivan Titov. 2017

Emergence

Jun-Tao He

Jiatao Gu

Jiajun Shen

Marc’Aurelio

Matthew Henderson

I. Casanueva

Nikola Mrkˇsi´c

Pei-hao Su

Tsung-Hsien Wen

Ivan Vuli´c

Yikang Shen

Yi Tay

Che Zheng

Dara Bahri

Donald

Metzler Aaron

Courville

Structformer

Ashish Vaswani

Noam M. Shazeer

Niki Parmar

Thomas Wolf

Lysandre Debut

Julien Victor Sanh

Clement Chaumond

Anthony Delangue

Pier-339 Moi

Tim ric Cistac

R´emi Rault

Morgan Louf

Qizhe Xie

Eduard H. Hovy

Silei Xu

Sina Jandaghi Semnani

Giovanni Campagna

Pretrained language models have signiﬁcantly 001 improved the performance of down-stream 002 language understanding tasks, including ex-00… (see more)3 tractive question answering, by providing 004 high-quality contextualized word embeddings. 005 However, training question answering models 006 still requires large amounts of annotated data 007 for speciﬁc domains. In this work, we pro-008 pose a cooperative, self-play learning frame-009 work, REGEX, for automatically generating 010 more non-trivial question-answer pairs to im-011 prove model performance. REGEX is built 012 upon a masked answer extraction task with an 013 interactive learning environment containing an 014 answer entity REcognizer, a question Gener-015 ator, and an answer EXtractor. Given a pas-016 sage with a masked entity, the generator gen-017 erates a question around the entity, and the 018 extractor is trained to extract the masked en-019 tity with the generated question and raw texts. 020 The framework allows the training of question 021 generation and answering models on any text 022 corpora without annotation. We further lever-023 age a reinforcement learning technique to re-024 ward generating high-quality questions and to 025 improve the answer extraction model’s perfor-026 mance. Experiment results show that REGEX 027 outperforms the state-of-the-art (SOTA) pre-028 trained language models and transfer learning 029 approaches on standard question-answering 030 benchmarks, and yields the new SOTA per-031 formance under given model size and transfer 032 learning settings. 033

2020-12-31

(published)

www.semanticscholar.org

Correcting Momentum in Temporal Difference Learning

Emmanuel Bengio

Joelle Pineau

Doina Precup

A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, r… (see more)eapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale: not only does the gradient of the loss change due to parameter updates, the loss itself changes due to bootstrapping. We first show that this phenomenon exists, and then propose a first-order correction term to momentum. We show that this correction term improves sample efficiency in policy evaluation by correcting target value drift. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.

2020-12-31

arXiv (preprint)

doi.org

openreview.net

Data-Efficient Reinforcement Learning with Self-Predictive Representations

Max Schwarzer

Ankesh Anand

Rishab Goel

R Devon Hjelm

Aaron Courville

Philip Bachman

While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interacti… (see more)on with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Self-Predictive Representations(SPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.415 on Atari in a setting limited to 100k steps of environment interaction, which represents a 55% relative improvement over the previous state-of-the-art. Notably, even in this limited data regime, SPR exceeds expert human scores on 7 out of 26 games. The code associated with this work is available at https://github.com/mila-iqia/spr

2020-12-31

ICLR (published)

doi.org

openreview.net

DATA-EFFICIENT REINFORCEMENT LEARNING

R Devon Hjelm

Philip Bachman

Aaron Courville

Data efficiency poses a major challenge for deep reinforcement learning. We approach this issue from the perspective of self-supervised repr… (see more)esentation learning, leveraging reward-free exploratory data to pretrain encoder networks. We employ a novel combination of latent dynamics modelling and goal-reaching objectives, which exploit the inherent structure of data in reinforcement learning. We demonstrate that our method scales well with network capacity and pretraining data. When evaluated on the Atari 100k data-efficiency benchmark, our approach significantly outperforms previous methods combining unsupervised pretraining with task-specific finetuning, and approaches human-level performance.

2020-12-31

(published)

www.semanticscholar.org

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications