On Efficiency in Hierarchical Reinforcement Learning
Zheng Wen
Morteza Ibrahimi
Andre Barreto
Benjamin Van Roy
Satinder Singh
Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, bo… (voir plus)th in terms of statistical as well as computational efficiency. While this has been demonstrated empirically over time in a variety of tasks, theoretical results quantifying the ben-efits of such methods are still few and far between. In this paper, we discuss the kind of structure in a Markov decision process which gives rise to efficient HRL methods. Specifically, we formalize the intuition that HRL can exploit well repeating "subMDPs", with similar reward and transition structure. We show that, under reasonable assumptions, a model-based Thompson sampling-style HRL algorithm that exploits this structure is statistically efficient, as established through a finite-time regret bound. We also establish conditions under which planning with structure-induced options is near-optimal and computationally efficient.
Electric Vehicles Equilibrium Model that Considers Queue Delay and Mixed Traffic
Nurit Oliker
Miguel F. Anjos
Bernard Gendron
This study develops an equilibrium model for electric vehicles (EVs) that considers both queue delays in charging stations and flow dependen… (voir plus)t travel times. This is a user equilibrium model that accounts for travel, charging and queuing time in the path choice modelling of EVs and the complementary traffic. Waiting and service times in charging stations are represented by an m/m/k queuing system. The model considers multiple vehicle and driver classes, expressing different battery capacity, initial charge state and range anxiety level. Feasible paths are found for multiple classes given their limited travel range. A numerical application exemplifies the limitations of EVs assignment and their impact on flow distribution.
AN ENSEMBLE APPROACH FOR DETECTING MACHINE FAILURE FROM SOUND Technical
Faruk Ahmed
Phong Cao Nguyen
We develop an ensemble-based approach for our submission to the anomaly detection challenge at DCASE 2020. The main members of our ensemble … (voir plus)are auto-encoders (with reconstruction error as the signal), classifiers (with negative predictive confidence as the signal), mismatch of the time-shifted signal with its Fourier-phase-shifted version, and a Gaussian mixture model on a set of common short-term features extracted from the waveform. The scores are passed through an exponential non-linearity and weighted to provide the final score, where the weighting and scaling hyper-parameters are learned on the development set. Our ensemble improves over the baseline on the development set.
An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay
Scott Fujimoto
Prioritized Experience Replay (PER) is a deep reinforcement learning technique in which agents learn from transitions sampled with non-unifo… (voir plus)rm probability proportionate to their temporal-difference error. We show that any loss function evaluated with non-uniformly sampled data can be transformed into another uniformly sampled loss function with the same expected gradient. Surprisingly, we find in some environments PER can be replaced entirely by this new loss function without impact to empirical performance. Furthermore, this relationship suggests a new branch of improvements to PER by correcting its uniformly sampled loss function equivalent. We demonstrate the effectiveness of our proposed modifications to PER and the equivalent loss function in several MuJoCo and Atari environments.
Expressiveness and Learning of Hidden Quantum Markov Models
Sandesh M. Adhikary
Siddarth Srinivasan
Byron Boots
Extending classical probabilistic reasoning using the quantum mechanical view of probability has been of recent interest, particularly in th… (voir plus)e development of hidden quantum Markov models (HQMMs) to model stochastic processes. However, there has been little progress in characterizing the expressiveness of such models and learning them from data. We tackle these problems by showing that HQMMs are a special subclass of the general class of observable operator models (OOMs) that do not suffer from the \emph{negative probability problem} by design. We also provide a feasible retraction-based learning algorithm for HQMMs using constrained gradient descent on the Stiefel manifold of model parameters. We demonstrate that this approach is faster and scales to larger models than previous learning algorithms.
Fairness in Kidney Exchange Programs through Optimal Solutions Enumeration
Not all patients who need kidney transplant can find a donor with compatible characteristics. Kidney exchange programs (KEPs) seek to match … (voir plus)such incompatible patient-donor pairs together, usually with the objective of maximizing the total number of transplants. We propose a randomized policy for selecting an optimal solution in which patients’ equity of opportunity to receive a transplant is promoted. Our approach gives rise to the problem of enumerating all optimal solutions, which we tackle using a hybrid of constraint programming and linear programming. We empirically demonstrate the advantages of our proposed method over the common practice of using the first optimal solution obtained by a solver.
Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation
Si Yi Meng
Sharan Vaswani
Issam Hadj Laradji
Mark Schmidt
We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied b… (voir plus)y over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.
Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents
Christian Rupprecht
Cyril Ibrahim
As deep reinforcement learning driven by visual perception becomes more widely used there is a growing need to better understand and probe t… (voir plus)he learned agents. Understanding the decision making process and its relationship to visual inputs can be very valuable to identify problems in learned behavior. However, this topic has been relatively under-explored in the research community. In this work we present a method for synthesizing visual inputs of interest for a trained agent. Such inputs or states could be situations in which specific actions are necessary. Further, critical states in which a very high or a very low reward can be achieved are often interesting to understand the situational awareness of the system as they can correspond to risky states. To this end, we learn a generative model over the state space of the environment and use its latent space to optimize a target function for the state of interest. In our experiments we show that this method can generate insights for a variety of environments and reinforcement learning methods. We explore results in the standard Atari benchmark games as well as in an autonomous driving simulator. Based on the efficiency with which we have been able to identify behavioural weaknesses with this technique, we believe this general approach could serve as an important tool for AI safety applications.
Forethought and Hindsight in Credit Assignment
Veronica Chelu
Hado van Hasselt
We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent … (voir plus)can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions. Particularly, we work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We establish the relative merits, limitations and complementary properties of both planning mechanisms in carefully constructed scenarios. Further, we investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated. Lastly, we discuss the issue of model estimation and highlight a spectrum of methods that stretch from explicit environment-dynamics predictors to more abstract planner-aware models.
GAIT: A Geometric Approach to Information Theory
Jose Gallego-Posada
Ankit Vani
Max Schwarzer
We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities … (voir plus)between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our proposed divergence exhibits performance on par with state-of-the-art methods based on the Wasserstein distance, but enjoys a closed-form expression that can be computed efficiently. We demonstrate the versatility of our method via experiments on a broad range of domains: training generative models, computing image barycenters, approximating empirical measures and counting modes.
GraphMix: Improved Training of Graph Neural Networks for Semi-Supervised Learning
Vikas Verma
Meng Qu
Alex Lamb
Juho Kannala
We present GraphMix , a regularized training scheme for Graph Neural Network based semi-supervised object classification, leveraging the re… (voir plus)cent advances in the regularization of classical deep neural networks. Specifically, we pro-pose a unified approach in which we train a fully-connected network jointly with the graph neural network via parameter sharing, interpolation-based regularization and self-predicted-targets. Our proposed method is architecture agnostic in the sense that it can be applied to any variant of graph neural networks which applies a parametric transformation to the features of the graph nodes. Despite its simplicity, with GraphMix we can consistently improve results and achieve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets :Cora-Full, Co-author-CS and Co-author-Physics.
How to make your optimizer generalize better
Sharan Vaswani
Reza Babenzhad
Sait AI Lab
Montreal.
Jose Gallego
Aaron Mishkin
We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and… (voir plus) over-parametrized regimes. For over-parameterized linear regression, where there are infinitely many interpolating solutions, different optimization methods can converge to solutions with varying generalization performance. In this setting, we show that projections onto linear spans can be used to move between solutions. Furthermore, via a simple reparameterization, we can ensure that an arbitrary optimizer converges to the minimum (cid:96) 2 -norm solution with favourable generalization properties. For under-parameterized linear clas-sification, optimizers can converge to different decision boundaries separating the data. We prove that for any such classifier, there exists a family of quadratic norms (cid:107)·(cid:107) P such that the classifier’s direction is the same as that of the maximum P -margin solution. We argue that analyzing convergence to the standard maximum (cid:96) 2 -margin is arbitrary and show that minimizing the norm induced by the data can result in better generalization. We validate our theoretical results via experiments on synthetic and real datasets.