Riashat Islam

RE-EVALUATE: Reproducibility in Evaluating Reinforcement Learning Algorithms

Reinforcement learning (RL) has recently achieved tremendous success in solving complex tasks. Careful considerations are made towards repro… (see more)ducible research in machine learning. Reproducibility in RL often becomes more difficult, due to the lack of standard evaluation method and detailed methodology for algorithms and comparisons with existing work. In this work, we highlight key differences in evaluation in RL compared to supervised learning, and discuss specific issues that are often non-intuitive for newcomers. We study the importance of reproducibility in evaluation in RL, and propose an evaluation pipeline that can be decoupled from the algorithm code. We hope such an evaluation pipeline can be standardized, as a step towards robust and reproducible research in RL.

2018-06-26

ICML.cc/2018/RML (poster)

openreview.net

Deep Reinforcement Learning that Matters

Philip Bachman

In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning… (see more) (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (published)

doi.org

arxiv.org

Bayesian Hypernetworks

David M. Krueger

Chin-wei Huang

Riashat Islam

Ryan Turner

Alexandre Lacoste

Aaron Courville

We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, h, is a neura… (see more)l network which learns to transform a simple noise distribution, p(e) = N(0,I), to a distribution q(t) := q(h(e)) over the parameters t of another neural network (the ``primary network). We train q with variational inference, using an invertible h to enable efficient estimation of the variational lower bound on the posterior p(t | D) via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap iid sampling of q(t). In practice, Bayesian hypernets provide a better defense against adversarial examples than dropout, and also exhibit competitive performance on a suite of tasks which evaluate model uncertainty, including regularization, active learning, and anomaly detection.

2017-10-12

ArXiv (preprint)

arxiv.org

Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control

Policy gradient methods in reinforcement learning have become increasingly prevalent for state-of-the-art performance in continuous control … (see more)tasks. Novel methods typically benchmark against a few key algorithms such as deep deterministic policy gradients and trust region policy optimization. As such, it is important to present and use consistent baselines experiments. However, this can be difficult due to general variance in the algorithms, hyper-parameter tuning, and environment stochasticity. We investigate and discuss: the significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results. We provide guidelines on reporting novel results as comparisons against baseline methods such that future researchers can make informed decisions when investigating novel methods.

2017-06-16

ICML.cc/2017/RML (poster)

openreview.net

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Riashat Islam

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Riashat Islam

Publications