Parallel and Recurrent Cascade Models as a Unifying Force for Understanding Subcellular Computation
Emerson F. Harkin
Peter R. Shen
Anisha Goel
Richard Naud
Neurons are very complicated computational devices, incorporating numerous non-linear processes, particularly in their dendrites. Biophysica… (see more)l models capture these processes directly by explicitly modelling physiological variables, such as ion channels, current flow, membrane capacitance, etc. However, another option for capturing the complexities of real neural computation is to use cascade models, which treat individual neurons as a cascade of linear and non-linear operations, akin to a multi-layer artificial neural network. Recent research has shown that cascade models can capture single-cell computation well, but there are still a number of sub-cellular, regenerative dendritic phenomena that they cannot capture, such as the interaction between sodium, calcium, and NMDA spikes in different compartments. Here, we propose that it is possible to capture these additional phenomena using parallel, recurrent cascade models, wherein an individual neuron is modelled as a cascade of parallel linear and non-linear operations that can be connected recurrently, akin to a multi-layer, recurrent, artificial neural network. Given their tractable mathematical structure, we show that neuron models expressed in terms of parallel recurrent cascades can themselves be integrated into multi-layered artificial neural networks and trained to perform complex tasks. We go on to discuss potential implications and uses of these models for artificial intelligence. Overall, we argue that parallel, recurrent cascade models provide an important, unifying tool for capturing single-cell computation and exploring the algorithmic implications of physiological phenomena.
The default mode network in cognition: a topographical perspective
Jonathan Smallwood
Boris C Bernhardt
Robert Leech
Elizabeth Jefferies
Daniel S. Margulies
Beyond variance reduction: Understanding the true impact of baselines on policy optimization
Wesley Chung
Valentin Thomas
Marlos C. Machado
Continuous Coordination As a Realistic Scenario for Lifelong Learning
Hadi Nekoei
Akilesh Badrinaaraayanan
Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. L… (see more)ifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents' policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi -- a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works. The code and all pre-trained models are available at https://github.com/chandar-lab/Lifelong-Hanabi.
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation
Scott Fujimoto
Marginalized importance sampling (MIS), which measures the density ratio between the state-action occupancy of a target policy and that of a… (see more) sampling distribution, is a promising approach for off-policy evaluation. However, current state-of-the-art MIS methods rely on complex optimization tricks and succeed mostly on simple toy problems. We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy. The successor representation can be trained through deep reinforcement learning methodology and decouples the reward optimization from the dynamics of the environment, making the resulting algorithm stable and applicable to high-dimensional domains. We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.
Directional Graph Networks
Saro Passaro
Vincent Létourneau
William L. Hamilton
Gabriele Corso
Pietro Lio
The lack of anisotropic kernels in graph neural networks (GNNs) strongly limits their expressiveness, contributing to well-known issues such… (see more) as over-smoothing. To overcome this limitation, we propose the first globally consistent anisotropic kernels for GNNs, allowing for graph convolutions that are defined according to topologicaly-derived directional flows. First, by defining a vector field in the graph, we develop a method of applying directional derivatives and smoothing by projecting node-specific messages into the field. Then, we propose the use of the Laplacian eigenvectors as such vector field. We show that the method generalizes CNNs on an
Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards
Susan Amin
Maziar Gomrokchi
Hossein Aboutalebi
Harsh Satija
A major challenge in reinforcement learning is the design of exploration strategies, especially for environments with sparse reward structur… (see more)es and continuous state and action spaces. Intuitively, if the reinforcement signal is very scarce, the agent should rely on some form of short-term memory in order to cover its environment efficiently. We propose a new exploration method, based on two intuitions: (1) the choice of the next exploratory action should depend not only on the (Markovian) state of the environment, but also on the agent's trajectory so far, and (2) the agent should utilize a measure of spread in the state space to avoid getting stuck in a small region. Our method leverages concepts often used in statistical physics to provide explanations for the behavior of simplified (polymer) chains in order to generate persistent (locally self-avoiding) trajectories in state space. We discuss the theoretical properties of locally self-avoiding walks and their ability to provide a kind of short-term memory through a decaying temporal correlation within the trajectory. We provide empirical evaluations of our approach in a simulated 2D navigation task, as well as higher-dimensional MuJoCo continuous control locomotion tasks with sparse rewards.
RNN with Particle Flow for Probabilistic Spatio-temporal Forecasting
Soumyasundar Pal
Liheng Ma
Yingxue Zhang
Spatio-temporal forecasting has numerous applications in analyzing wireless, traffic, and financial networks. Many classical statistical mod… (see more)els often fall short in handling the complexity and high non-linearity present in time-series data. Recent advances in deep learning allow for better modelling of spatial and temporal dependencies. While most of these models focus on obtaining accurate point forecasts, they do not characterize the prediction uncertainty. In this work, we consider the time-series data as a random realization from a nonlinear state-space model and target Bayesian inference of the hidden states for probabilistic forecasting. We use particle flow as the tool for approximating the posterior distribution of the states, as it is shown to be highly effective in complex, high-dimensional settings. Thorough experimentation on several real world time-series datasets demonstrates that our approach provides better characterization of uncertainty while maintaining comparable accuracy to the state-of-the art point forecasting methods.
Smart About Meds (SAM): a pilot randomized controlled trial of a mobile application to improve medication adherence following hospital discharge
Bettina Habib
Melissa Bustillo
Santiago Nicolas Marquez
Manish Thakur
Thai Tran
Daniala L Weir
Robyn Tamblyn
Structure-Aware Reinforcement Learning for Node-Overload Protection in Mobile Edge Computing
Anirudha Jitani
Zhongwen Zhu
Hatem Abou-Zeid
Emmanuel Thepie Fapi
Hakimeh Purmehdi
Mobile Edge Computing (MEC) involves placing computational capability and applications at the edge of the network, providing benefits such a… (see more)s reduced latency, reduced network congestion, and improved performance of applications. The performance and reliability of MEC degrades significantly when the edge server(s) in the cluster are overloaded. In this work, an adaptive admission control policy to prevent edge node from getting overloaded is presented. This approach is based on a recently-proposed low complexity RL (Reinforcement Learning) algorithm called SALMUT (Structure-Aware Learning for Multiple Thresholds), which exploits the structure of the optimal admission control policy in multi-class queues for an average-cost setting. We extend the framework to work for node overload-protection problem in a discounted-cost setting. The proposed solution is validated using several scenarios mimicking real-world deployments in two different settings — computer simulations and a docker testbed. Our empirical evaluations show that the total discounted cost incurred by SALMUT is similar to state-of-the-art deep RL algorithms such as PPO (Proximal Policy Optimization) and A2C (Advantage Actor Critic) but requires an order of magnitude less time to train, outputs easily interpretable policy, and can be deployed in an online manner.
Measures of balance in combinatorial optimization
Philippe Olivier
Andrea Lodi
G. Pesant
Measures of balance in combinatorial optimization
Philippe Olivier
Andrea Lodi
Gilles Pesant