Publications

InfoBot: Structured Exploration in ReinforcementLearning Using Information Bottleneck

D. Strouse

Matthew Botvinick

Sergey Levine

2018-12-31

(published)

www.semanticscholar.org

InfoBot: Transfer and Exploration via the Information Bottleneck

Daniel Strouse

Matthew Botvinick

Sergey Levine

A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We postula… (see more)te that in the absence of useful reward signals, an effective exploration strategy should seek out {\it decision states}. These states lie at critical junctions in the state space from where the agent can transition to new, potentially unexplored regions. We propose to learn about decision states from prior experience. By training a goal-conditioned policy with an information bottleneck, we can identify decision states by examining where the model actually leverages the goal state. We find that this simple mechanism effectively identifies decision states, even in partially observed settings. In effect, the model learns the sensory cues that correlate with potential subgoals. In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.

2018-12-31

ICLR.cc/2019/Conference (poster)

doi.org

openreview.net

Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Accuracy

Alex Lamb

Vikas Verma

Juho Kannala

Yoshua Bengio

Adversarial robustness has become a central goal in deep learning, both in theory and practice. However, successful methods to improve adver… (see more)sarial robustness (such as adversarial training) greatly hurt generalization performance on the clean data. This could have a major impact on how adversarial robustness affects real world systems (i.e. many may opt to forego robustness if it can improve performance on the clean data). We propose Interpolated Adversarial Training, which employs recently proposed interpolation based training methods in the framework of adversarial training. On CIFAR-10, adversarial training increases clean test error from 5.8% to 16.7%, whereas with our Interpolated adversarial training we retain adversarial robustness while achieving a clean test error of only 6.5%. With our technique, the relative error increase for the robust model is reduced from 187.9% to just 12.1%.

2018-12-31

arXiv.org (preprint)

dblp.uni-trier.de

Lagrangian neurodynamics for real-time error-backpropagation across cortical areas

Dominik Dold

Akos F. Kungl

João Sacramento

Mihai A. Petrovici

Kaspar Anton Schindler

Jonathan Binas

Yoshua Bengio

Walter Senn

.

2018-12-31

(published)

www.semanticscholar.org

Learning Brain Dynamics from Calcium Imaging with Coupled van der Pol and LSTM

Germán Abrevaya

Irina Rish

Aleksandr Y. Aravkin

Guillermo Cecchi

James Kozloski

Pablo Polosecki

Peng Zheng

Silvina Ponce Dawson

Juliana Y. Rhee

David Daniel Cox

Many real-world data sets, especially in biology, are produced by complex nonlinear dynamical systems. In this paper, we focus on brain calc… (see more)ium imaging (CaI) of different organisms (zebrafish and rat), aiming to build a model of joint activation dynamics in large neuronal populations, including the whole brain of zebrafish. We propose a new approach for capturing dynamics of temporal SVD components that uses the coupled (multivariate) van der Pol (VDP) oscillator, a nonlinear ordinary differential equation (ODE) model describing neural activity, with a new parameter estimation technique that combines variable projection optimization and stochastic search. We show that the approach successfully handles nonlinearities and hidden state variables in the coupled VDP. The approach is accurate, achieving 0.82 to 0.94 correlation between the actual and model-generated components, and interpretable, as VDP’s coupling matrix reveals anatomically meaningful positive (excitatory) and negative (inhibitory) interactions across different brain subsystems corresponding to spatial SVD components. Moreover, VDP is comparable to (or sometimes better than) recurrent neural networks (LSTM) for (short-term) prediction of future brain activity; VDP needs less parameters to train, which was a plus on our small training data. Finally, the overall best predictive method, greatly outperforming both VDP and LSTM in shortand long-term predicitve settings on both datasets, was the new hybrid VDP-LSTM approach that used VDP to simulate large domain-specific dataset for LSTM pretraining; note that simple LSTM data-augmentation via noisy versions of training data was much less effective.

2018-12-31

(published)

www.semanticscholar.org

Learning deep representations by mutual information estimation and maximization

R Devon Hjelm

Alex Fedorov

Samuel Lavoie-Marchildon

Karan Grewal

Adam Trischler

Phil Bachman

Yoshua Bengio

This work investigates unsupervised learning of representations by maximizing mutual information between an input and the output of a deep n… (see more)eural network encoder. Importantly, we show that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks. We further control characteristics of the representation by matching to a prior distribution adversarially. Our method, which we call Deep InfoMax (DIM), outperforms a number of popular unsupervised learning methods and compares favorably with fully-supervised learning on several classification tasks in with some standard architectures. DIM opens new avenues for unsupervised learning of representations and is an important step towards flexible formulations of representation learning objectives for specific end-goals.

2018-12-31

ICLR (published)

openreview.net

Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference

Matthew D Riemer

Ignacio Cases

Robert Ajemian

Miao Liu

Irina Rish

Yuhai Tu

Gerald Tesauro

Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neura… (see more)l network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.

2018-12-31

ICLR.cc/2019/Conference (poster)

openreview.net

Learning Reliable Policies in the Bandit Setting with Application to Adaptive Clinical Trials

Hossein Aboutalebi

Doina Precup

Tibor Schuster

The stochastic multi-armed bandit problem is a well-known model for studying the explorationexploitation trade-off. It has significant possi… (see more)ble applications in adaptive clinical trials, which allow for a dynamic change of patient allocation ratios. However, most bandit learning algorithms are designed with the goal of minimizing the expected regret. While this approach is useful in many areas, in clinical trials, it can be sensitive to outlier data especially when the sample size is small. In this article, we propose a modification of the BESA algorithm [Baransi, Maillard, and Mannor, 2014] which takes into account the variance in the action outcomes in addition to the mean. We present a regret bound for our approach and evaluate it empirically both on synthetic problems as well as on a dataset form the clinical trial literature. Our approach compares favorably to a suite of standard bandit algorithms.

2018-12-31

KHD@IJCAI (published)

dblp.uni-trier.de

Learning representations of Logical Formulae using Graph Neural Networks

Xavier Glorot

Ankit Anand

Eser Aygün

Shibl Mourad

Pushmeet Kohli

Doina Precup

We explore the use of Graph Neural Networks(GNNs) for learning representations of propositional and ﬁrst-order logical formulae. Tradition… (see more)al non-graphical based approaches like CNNs and LSTMs do not exploit invariant properties like variable renaming and order invariance predominantly present in logical formulae. In this work, we explicitly try to encode these logical invariances using GNNs. We use the task of entailment proposed in Evans et al. [2018] for propositional logic. We also explore our approach for the task of proof length prediction in ﬁrst-order logic. We use the Mizar-40 dataset to evaluate several representation learning approaches for proof length prediction task. We observe that GNNs signiﬁcantly outperform the other traditional approaches on both these tasks.

2018-12-31

(published)

www.semanticscholar.org

LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models

Yuanshuo Zhou

Bradley Gram-Hansen

Tobias Kohn

Tom Rainforth

Hongseok Yang

Frank N. Wood

We develop a new Low-level, First-order Probabilistic Programming Language~(LF-PPL) suited for models containing a mix of continuous, discre… (see more)te, and/or piecewise-continuous variables. The key success of this language and its compilation scheme is in its ability to automatically distinguish parameters the density function is discontinuous with respect to, while further providing runtime checks for boundary crossings. This enables the introduction of new inference engines that are able to exploit gradient information, while remaining efficient for models which are not everywhere differentiable. We demonstrate this ability by incorporating a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this language has a density with measure zero discontinuities to maintain the validity of the inference engine.

2018-12-31

AISTATS (published)

proceedings.mlr.press

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Kundan Kumar

Rithesh Kumar

Thibault de Boissiere

Lucas Gestin

Wei Zhen Teoh

Jose Sotelo

Alexandre De Brébisson

Yoshua Bengio

Aaron Courville

2018-12-31

Neural Information Processing Systems (published)

doi.org

arxiv.org

Meta-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation

Mingde Zhao

Ian Porada

Sitao Luan

Xiao-Wen Chang

Doina Precup

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that lea… (see more)rn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to boost sample efficiency by temporal credit assignment, i.e. deciding which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter

2018-12-31

arXiv (preprint)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications