Publications

Batch Reinforcement Learning Through Continuation Method
Yijie Guo
Shengyu Feng
Ed Chi
Honglak Lee
Minmin Chen
Many real-world applications of reinforcement learning (RL) require the agent to learn from a fixed set of trajectories, without collecting … (see more)new interactions. Policy optimization under this setting is extremely challenging as: 1) the geometry of the objective function is hard to optimize efficiently; 2) the shift of data distributions causes high noise in the value estimation. In this work, we propose a simple yet effective policy iteration approach to batch RL using global optimization techniques known as continuation. By constraining the difference between the learned policy and the behavior policy that generates the fixed trajectories, and continuously relaxing the constraint, our method 1) helps the agent escape local optima; 2) reduces the error in policy evaluation in the optimization procedure. We present results on a variety of control tasks, game environments, and a recommendation task to empirically demonstrate the efficacy of our proposed method.
Can Open Source Licenses Help Regulate Lethal Autonomous Weapons?
Cheng Lin
Lethal autonomous weapon systems (LAWS, ethal autonomous weapon also known as killer robots) are a real and emerging technology that have th… (see more)e potential to radically transform warfare. Because of the myriad of moral, legal, privacy, and security risks the technology introduces, many scholars and advocates have called for a ban on the development, production, and use of fully autonomous weapons [1], [2].
Capacity Expansion in the College Admission Problem
Federico Bobbio
Alfredo Torrico
Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Switched Linear Systems
Borna Sayedana
Mohammad Afshari
Peter E. Caines
In this paper, we investigate the problem of system identification for autonomous switched linear systems with complete state observations.… (see more) We propose switched least squares method for the identification for switched linear systems, show that this method is strongly consistent, and derive data-dependent and data-independent rates of convergence. In particular, our data-dependent rate of convergence shows that, almost surely, the system identification error is O (cid:0)(cid:112) log( T ) /T (cid:1) where T is the time horizon. These results show that our method for switched linear systems has the same rate of convergence as least squares method for non-switched linear systems. We compare our results with those in the literature. We present numerical examples to illustrate the performance of the proposed system identification method.
Continual Learning via Local Module Composition
Oleksiy Ostapenko
Pau Rodriguez
Massimo Caccia
Modularity is a compelling solution to continual learning (CL), the problem of modeling sequences of related tasks. Learning and then compos… (see more)ing modules to solve different tasks provides an abstraction to address the principal challenges of CL including catastrophic forgetting, backward and forward transfer across tasks, and sub-linear model growth. We introduce local module composition (LMC), an approach to modular CL where each module is provided a local structural component that estimates a module's relevance to the input. Dynamic module composition is performed layer-wise based on local relevance scores. We demonstrate that agnosticity to task identities (IDs) arises from (local) structural learning that is module-specific as opposed to the task- and/or model-specific as in previous works, making LMC applicable to more CL settings compared to previous works. In addition, LMC also tracks statistics about the input distribution and adds new modules when outlier samples are detected. In the first set of experiments, LMC performs favorably compared to existing methods on the recent Continual Transfer-learning Benchmark without requiring task identities. In another study, we show that the locality of structural learning allows LMC to interpolate to related but unseen tasks (OOD), as well as to compose modular networks trained independently on different task sequences into a third modular network without any fine-tuning. Finally, in search for limitations of LMC we study it on more challenging sequences of 30 and 100 tasks, demonstrating that local module selection becomes much more challenging in presence of a large number of candidate modules. In this setting best performing LMC spawns much fewer modules compared to an oracle based baseline, however, it reaches a lower overall accuracy. The codebase is available under https://github.com/oleksost/LMC.
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal
Marlos C. Machado
Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generali… (see more)zation, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. PSM assigns high similarity to states for which the optimal policies in those states as well as in future states are similar. We also present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings (PSEs). We demonstrate that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.
DATA-EFFICIENT REINFORCEMENT LEARNING
Nitarshan Rajkumar
Michael Noukhovitch
Ankesh Anand
Philip Bachman
Data efficiency poses a major challenge for deep reinforcement learning. We approach this issue from the perspective of self-supervised repr… (see more)esentation learning, leveraging reward-free exploratory data to pretrain encoder networks. We employ a novel combination of latent dynamics modelling and goal-reaching objectives, which exploit the inherent structure of data in reinforcement learning. We demonstrate that our method scales well with network capacity and pretraining data. When evaluated on the Atari 100k data-efficiency benchmark, our approach significantly outperforms previous methods combining unsupervised pretraining with task-specific finetuning, and approaches human-level performance.
Deep LDA-Pruned Nets for Efficient Facial Gender Classification
Qing Tian
James J. Clark
Many real-time tasks, such as human-computer interac-tion, require fast and efficient facial gender classification. Although deep CNN nets… (see more) have been very effective for a mul-titude of classification tasks, their high space and time de-mands make them impractical for personal computers and mobile devices without a powerful GPU. In this paper, we develop a 16-layer, yet lightweight, neural network which boosts efficiency while maintaining high accuracy. Our net is pruned from the VGG-16 model [35] starting from the last convolutional (conv) layer where we find neuron activations are highly uncorrelated given the gender. Through Fisher’s Linear Discriminant Analysis (LDA) [8], we show that this high decorrelation makes it safe to discard directly last conv layer neurons with high within-class variance and low between-class variance. Combined with either Support Vector Machines (SVM) or Bayesian classification, the reduced CNNs are capable of achieving comparable (or even higher) accuracies on the LFW and CelebA datasets than the original net with fully connected layers. On LFW, only four Conv5 3 neurons are able to maintain a comparably high recognition accuracy, which results in a reduction of total network size by a factor of 70X with a 11 fold speedup. Comparisons with a state-of-the-art pruning method [12] (as well as two smaller nets [20, 24]) in terms of accuracy loss and convolutional layers pruning rate are also provided.
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. M… (see more)ost published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field. This work received an outstanding paper award at NeurIPS 2021.
Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance
Alexander Tong
Guillaume Huguet
Dennis L. Shung
Amine Natik
Manik Kuchroo
Smita Krishnaswamy
In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observation… (see more)s in many domains. Further
Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance
Alexander Tong
Guillaume Huguet
Dennis Shung
Amine Natik
Manik Kuchroo
Smita Krishnaswamy
In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observation… (see more)s in many domains. Further
Enjeux juridiques propres au modèle émergent des patients accompagnateurs dans les milieux de soins au Québec (Legal Issues Arising from the Emerging Model of Accompanying Patients in the Quebec Healthcare System)
Léa Boutrouille
Marie-Pascale Pomey