Publications

Batch Reinforcement Learning Through Continuation Method

Yijie Guo

Shengyu Feng

Ed Chi

Honglak Lee

Minmin Chen

Many real-world applications of reinforcement learning (RL) require the agent to learn from a fixed set of trajectories, without collecting … (see more)new interactions. Policy optimization under this setting is extremely challenging as: 1) the geometry of the objective function is hard to optimize efficiently; 2) the shift of data distributions causes high noise in the value estimation. In this work, we propose a simple yet effective policy iteration approach to batch RL using global optimization techniques known as continuation. By constraining the difference between the learned policy and the behavior policy that generates the fixed trajectories, and continuously relaxing the constraint, our method 1) helps the agent escape local optima; 2) reduces the error in policy evaluation in the optimization procedure. We present results on a variety of control tasks, game environments, and a recommendation task to empirically demonstrate the efficacy of our proposed method.

2021-01-01

ICLR (published)

openreview.net

Can Open Source Licenses Help Regulate Lethal Autonomous Weapons?

Cheng Lin

AJung Moon

Lethal autonomous weapon systems (LAWS, ethal autonomous weapon also known as killer robots) are a real and emerging technology that have th… (see more)e potential to radically transform warfare. Because of the myriad of moral, legal, privacy, and security risks the technology introduces, many scholars and advocates have called for a ban on the development, production, and use of fully autonomous weapons [1], [2].

2021-01-01

IEEE technology & society magazine (published)

doi.org

Capacity Expansion in the College Admission Problem

Federico Bobbio

Margarida Carvalho

Andrea Lodi

Alfredo Torrico

2021-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Switched Linear Systems

Borna Sayedana

Mohammad Afshari

Peter E. Caines

Aditya Mahajan

In this paper, we investigate the problem of system identiﬁcation for autonomous switched linear systems with complete state observations.… (see more) We propose switched least squares method for the identiﬁcation for switched linear systems, show that this method is strongly consistent, and derive data-dependent and data-independent rates of convergence. In particular, our data-dependent rate of convergence shows that, almost surely, the system identiﬁcation error is O (cid:0)(cid:112) log( T ) /T (cid:1) where T is the time horizon. These results show that our method for switched linear systems has the same rate of convergence as least squares method for non-switched linear systems. We compare our results with those in the literature. We present numerical examples to illustrate the performance of the proposed system identiﬁcation method.

2021-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Continual Learning via Local Module Composition

Oleksiy Ostapenko

Pau Rodriguez

Massimo Caccia

Laurent Charlin

Modularity is a compelling solution to continual learning (CL), the problem of modeling sequences of related tasks. Learning and then compos… (see more)ing modules to solve different tasks provides an abstraction to address the principal challenges of CL including catastrophic forgetting, backward and forward transfer across tasks, and sub-linear model growth. We introduce local module composition (LMC), an approach to modular CL where each module is provided a local structural component that estimates a module's relevance to the input. Dynamic module composition is performed layer-wise based on local relevance scores. We demonstrate that agnosticity to task identities (IDs) arises from (local) structural learning that is module-specific as opposed to the task- and/or model-specific as in previous works, making LMC applicable to more CL settings compared to previous works. In addition, LMC also tracks statistics about the input distribution and adds new modules when outlier samples are detected. In the first set of experiments, LMC performs favorably compared to existing methods on the recent Continual Transfer-learning Benchmark without requiring task identities. In another study, we show that the locality of structural learning allows LMC to interpolate to related but unseen tasks (OOD), as well as to compose modular networks trained independently on different task sequences into a third modular network without any fine-tuning. Finally, in search for limitations of LMC we study it on more challenging sequences of 30 and 100 tasks, demonstrating that local module selection becomes much more challenging in presence of a large number of candidate modules. In this setting best performing LMC spawns much fewer modules compared to an oracle based baseline, however, it reaches a lower overall accuracy. The codebase is available under https://github.com/oleksost/LMC.

openreview.net

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Rishabh Agarwal

Marlos C. Machado

Pablo Samuel Castro

Marc Gendron-Bellemare

Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generali… (see more)zation, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. PSM assigns high similarity to states for which the optimal policies in those states as well as in future states are similar. We also present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings (PSEs). We demonstrate that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.

2021-01-01

ICLR (published)

openreview.net

DATA-EFFICIENT REINFORCEMENT LEARNING

Nitarshan Rajkumar

Michael Noukhovitch

Ankesh Anand

Laurent Charlin

(Rex) Devon Hjelm

Philip Bachman

Aaron Courville

Data efficiency poses a major challenge for deep reinforcement learning. We approach this issue from the perspective of self-supervised repr… (see more)esentation learning, leveraging reward-free exploratory data to pretrain encoder networks. We employ a novel combination of latent dynamics modelling and goal-reaching objectives, which exploit the inherent structure of data in reinforcement learning. We demonstrate that our method scales well with network capacity and pretraining data. When evaluated on the Atari 100k data-efficiency benchmark, our approach significantly outperforms previous methods combining unsupervised pretraining with task-specific finetuning, and approaches human-level performance.

2021-01-01

(published)

www.semanticscholar.org

Deep LDA-Pruned Nets for Efﬁcient Facial Gender Classiﬁcation

Qing Tian

Tal Arbel

James J. Clark

Many real-time tasks, such as human-computer interac-tion, require fast and efﬁcient facial gender classiﬁcation. Although deep CNN nets… (see more) have been very effective for a mul-titude of classiﬁcation tasks, their high space and time de-mands make them impractical for personal computers and mobile devices without a powerful GPU. In this paper, we develop a 16-layer, yet lightweight, neural network which boosts efﬁciency while maintaining high accuracy. Our net is pruned from the VGG-16 model [35] starting from the last convolutional (conv) layer where we ﬁnd neuron activations are highly uncorrelated given the gender. Through Fisher’s Linear Discriminant Analysis (LDA) [8], we show that this high decorrelation makes it safe to discard directly last conv layer neurons with high within-class variance and low between-class variance. Combined with either Support Vector Machines (SVM) or Bayesian classiﬁcation, the reduced CNNs are capable of achieving comparable (or even higher) accuracies on the LFW and CelebA datasets than the original net with fully connected layers. On LFW, only four Conv5 3 neurons are able to maintain a comparably high recognition accuracy, which results in a reduction of total network size by a factor of 70X with a 11 fold speedup. Comparisons with a state-of-the-art pruning method [12] (as well as two smaller nets [20, 24]) in terms of accuracy loss and convolutional layers pruning rate are also provided.

2021-01-01

(published)

www.semanticscholar.org

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Rishabh Agarwal

Max Schwarzer

Pablo Samuel Castro

Aaron Courville

Marc Gendron-Bellemare

Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. M… (see more)ost published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field. This work received an outstanding paper award at NeurIPS 2021.

openreview.net

Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

Alexander Tong

Guillaume Huguet

Dennis L. Shung

Amine Natik

Manik Kuchroo

Guillaume Lajoie

Guy Wolf

Smita Krishnaswamy

In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observation… (see more)s in many domains. Further

2021-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

Alexander Tong

Guillaume Huguet

Dennis Shung

Amine Natik

Manik Kuchroo

Guillaume Lajoie

Guy Wolf

Smita Krishnaswamy

In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observation… (see more)s in many domains. Further

2021-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Enjeux juridiques propres au modèle émergent des patients accompagnateurs dans les milieux de soins au Québec (Legal Issues Arising from the Emerging Model of Accompanying Patients in the Quebec Healthcare System)

Léa Boutrouille

Catherine Régis

Marie-Pascale Pomey

2021-01-01

SSRN Electronic Journal (published)

doi.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications