We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Accelerating Smooth Games by Manipulating Spectral Shapes
We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "… (see more)any" online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learning analysis, and the second part can be quantified independently of the learning algorithm. Therefore, the proposed reduction can be used as a tool to systematically design new RL algorithms. We demonstrate this idea by devising a simple RL algorithm based on mirror descent and the generative-model oracle. For any
2020-06-03
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (published)
Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to… (see more) scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.
2020-06-03
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (published)
Abstraction can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information… (see more), potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.ion can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.
2020-06-03
International Conference on Artificial Intelligence and Statistics (published)
Abstraction can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information… (see more), potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.ion can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.
2020-06-03
International Conference on Artificial Intelligence and Statistics (published)
We introduce a principled method to train end-to-end analog neural networks by stochastic gradient descent. In these analog neural networks,… (see more) the weights to be adjusted are implemented by the conductances of programmable resistive devices such as memristors [Chua, 1971], and the nonlinear transfer functions (or `activation functions') are implemented by nonlinear components such as diodes. We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models: they possess an energy function as a consequence of Kirchhoff's laws governing electrical circuits. This property enables us to train them using the Equilibrium Propagation framework [Scellier and Bengio, 2017]. Our update rule for each conductance, which is local and relies solely on the voltage drop across the corresponding resistor, is shown to compute the gradient of the loss function. Our numerical simulations, which use the SPICE-based Spectre simulation framework to simulate the dynamics of electrical circuits, demonstrate training on the MNIST classification task, performing comparably or better than equivalent-size software-based neural networks. Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.