Publications

Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

Bellemare Marc-Emmanuel

Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL sy… (voir plus)stems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from scratch, which would have been prohibitively expensive. Additionally, the inefficiency of deep RL typically excludes researchers without access to industrial-scale resources from tackling computationally-demanding problems. To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e.g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another. As a step towards enabling reincarnating RL from any agent to any other agent, we focus on the specific setting of efficiently transferring an existing sub-optimal policy to a standalone value-based RL agent. We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations. Equipped with this algorithm, we demonstrate reincarnating RL's gains over tabula rasa RL on Atari 2600 games, a challenging locomotion task, and the real-world problem of navigating stratospheric balloons. Overall, this work argues for an alternative approach to RL research, which we believe could significantly improve real-world RL adoption and help democratize it further. Open-sourced code and trained agents at https://agarwl.github.io/reincarnating_rl.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

doi.org

openreview.net

Representational ethical model calibration

Robert Carruthers

Isabel Straw

James K. Ruffle

Daniel Herron

Amy Nelson

Danilo Bzdok

Delmiro Fernandez-Reyes

Geraint Rees

Parashkev Nachev

2021-12-31

npj Digit. Medicine (publié)

doi.org

arxiv.org

Revisiting Heterophily For Graph Neural Networks

Sitao Luan

Chenqing Hua

Qincheng Lu

Jiaqi Zhu

Shuyuan Zhang

Mingde Zhao

Shuyuan Zhang

Xiao-Wen Chang

Doina Precup

Graph Neural Networks (GNNs) extend basic Neural Networks (NNs) by using graph structures based on the relational inductive bias (homophily … (voir plus)assumption). While GNNs have been commonly believed to outperform NNs in real-world tasks, recent work has identified a non-trivial set of datasets where their performance compared to NNs is not satisfactory. Heterophily has been considered the main cause of this empirical observation and numerous works have been put forward to address it. In this paper, we first revisit the widely used homophily metrics and point out that their consideration of only graph-label consistency is a shortcoming. Then, we study heterophily from the perspective of post-aggregation node similarity and define new homophily metrics, which are potentially advantageous compared to existing ones. Based on this investigation, we prove that some harmful cases of heterophily can be effectively addressed by local diversification operation. Then, we propose the Adaptive Channel Mixing (ACM), a framework to adaptively exploit aggregation, diversification and identity channels node-wisely to extract richer localized information for diverse node heterophily situations. ACM is more powerful than the commonly used uni-channel framework for node classification tasks on heterophilic graphs and is easy to be implemented in baseline GNN layers. When evaluated on 10 benchmark node classification tasks, ACM-augmented baselines consistently achieve significant performance gain, exceeding state-of-the-art GNNs on most tasks without incurring significant computational burden.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

doi.org

openreview.net

Riemannian Diffusion Models

Avishek Joey Bose

Diffusion models are recent state-of-the-art methods for image generation and likelihood estimation. In this work, we generalize continuous-… (voir plus)time diffusion models to arbitrary Riemannian manifolds and derive a variational framework for likelihood estimation. Computationally, we propose new methods for computing the Riemannian divergence which is needed in the likelihood estimation. Moreover, in generalizing the Euclidean case, we prove that maximizing this variational lower-bound is equivalent to Riemannian score matching. Empirically, we demonstrate the expressive power of Riemannian diffusion models on a wide spectrum of smooth manifolds, such as spheres, tori, hyperboloids, and orthogonal groups. Our proposed method achieves new state-of-the-art likelihoods on all benchmarks.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

doi.org

openreview.net

Robust Policy Learning over Multiple Uncertainty Sets

Annie Xie

Shagun Sodhani

Chelsea Finn

Joelle Pineau

Amy Zhang

Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods prov… (voir plus)ide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.

2021-12-31

ICML (publié)

proceedings.mlr.press

Robustness of Whittle Index Policy to Model Approximation

Amit Sinha

Aditya Mahajan

2021-12-31

Social Science Research Network (publié)

doi.org

Scalable Operator Allocation for Multirobot Assistance: A Restless Bandit Approach

Abhinav Dahiya

Nima Akbarzadeh

Aditya Mahajan

Stephen L. Smith

In this article, we consider the problem of allocating human operators in a system with multiple semiautonomous robots. Each robot is requir… (voir plus)ed to perform an independent sequence of tasks, subject to a chance of failing and getting stuck in a fault state at every task. If and when required, a human operator can assist or teleoperate a robot. Conventional dynamic programming-based techniques used to solve such problems face scalability issues due to an exponential growth of state and action spaces with the number of robots and operators. In this article, we derive conditions under which the operator allocation problem satisfies a technical condition called indexability, thereby enabling the use of the Whittle index heuristic. The conditions are easy to check, and we show that they hold for a wide range of problems of interest. Our key insight is to leverage the structure of the value function of individual robots, resulting in conditions that can be verified separately for each state of each robot. We apply these conditions to two types of transitions commonly seen in remote robot supervision systems. Through numerical simulations, we demonstrate the efficacy of Whittle index policy as a near-optimal and scalable approach that outperforms existing scalable methods.

2021-12-31

IEEE Transactions on Control of Network Systems (publié)

doi.org

arxiv.org

Sharpness-Aware Training for Accurate Inference on Noisy DNN Accelerators

Goncalo Mordido

A. Chandar

Franccois Leduc-Primeau

Energy-efﬁcient deep neural network (DNN) accelerators are prone to non-idealities that degrade DNN performance at inference time. To miti… (voir plus)gate such degradation, existing methods typically add perturbations to the DNN weights during training to simulate inference on noisy hardware. However, this often requires knowledge about the target hardware and leads to a trade-off between DNN performance and robustness, decreasing the former to increase the latter. In this work, we show that applying sharpness-aware training by optimizing for both the loss value and the loss sharpness signiﬁcantly improves robustness to noisy hardware at inference time while also increasing DNN performance. We further motivate our results by showing a high correlation between loss sharpness and model robustness. We show superior performance compared to injecting noise during training and aggressive weight clipping on multiple architectures, optimizers, datasets, and training regimes without relying on any assumptions about the target hardware. This is observed on a generic noise model as well as on accurate noise simulations from real hardware.

2021-12-31

arXiv.org (prépublication)

doi.org

Single Allocation Hub Location with Heterogeneous Economies of Scale

Borzou Rostami

Masoud Chitsaz

Okan Arslan

Gilbert Laporte

Andrea Lodi

The economies of scale in hub location is usually modeled by a constant parameter, which captures the benefits companies obtain through cons… (voir plus)olidation. In their article “Single allocation hub location with heterogeneous economies of scale,” Rostami et al. relax this assumption and consider hub-hub connection costs as piecewise linear functions of the flow amounts. This spoils the triangular inequality property of the distance matrix, making the classical flow-based model invalid and further complicates the problem. The authors tackle the challenge by building a mixed-integer quadratically constrained program and by developing a methodology based on constructing Lagrangian function, linear dual functions, and specialized polynomial-time algorithms to generate enhanced cuts. The developed method offers a new strategy in Benders-type decomposition through relaxing a set of complicating constraints in subproblems when such relaxation is tight. The results confirm the efficacy of the solution methods in solving large-scale problem instances.

2021-12-31

Operations Research (publié)

doi.org

Sociotechnical Harms: Scoping a Taxonomy for Harm Reduction

Renee Shelby

Shalaleh Rismani

Kathryn Henne

AJung Moon

Negar Rostamzadeh

Paul Nicholas

N'mah Fodiatu Yilla

Jess Gallegos