Probability Distillation: A Caveat and Alternatives
Chin-Wei Huang
Faruk Ahmed
Kundan Kumar
Alexandre Lacoste
Due to Van den Oord et al. (2018), probability distillation has recently been of interest to deep learning practitioners, where, as a practi… (voir plus)cal workaround for deploying autoregressive models in real-time applications, a student network is used to obtain quality samples in parallel. We identify a pathological optimization issue with the adopted stochastic minimization of the reverse-KL divergence: the curse of dimensionality results in a skewed gradient distribution that renders training inefficient. This means that KL-based “evaluative” training can be susceptible to poor exploration if the target distribution is highly structured. We then explore alternative principles for distillation, including one with an “instructive” signal, and show that it is possible to achieve qualitatively better results than with KL minimization.
Quaternion Recurrent Neural Networks
Titouan Parcollet
Mohamed Morchid
Georges Linarès
Chiheb Trabelsi
Renato De Mori
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term d… (voir plus)ependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependencies between the dimensions of the input vector. We propose a novel quaternion recurrent neural network (QRNN), alongside with a quaternion long-short term memory neural network (QLSTM), that take into account both the external relations and these internal structural dependencies with the quaternion algebra. Similarly to capsules, quaternions allow the QRNN to code internal dependencies by composing and processing multidimensional features as single entities, while the recurrent operation reveals correlations between the elements composing the sequence. We show that both QRNN and QLSTM achieve better performances than RNN and LSTM in a realistic application of automatic speech recognition. Finally, we show that QRNN and QLSTM reduce by a maximum factor of 3.3x the number of free parameters needed, compared to real-valued RNNs and LSTMs to reach better results, leading to a more compact representation of the relevant information.
Recall Traces: Backtracking Models for Efficient Reinforcement Learning
Anirudh Goyal
Philemon Brakel
William Fedus
Soumye Singhal
Timothy P. Lillicrap
Sergey Levine
In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provid… (voir plus)e a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on- and off-policy RL algorithms across several environments and tasks.
Reducing the variance in online optimization by transporting past gradients
Sébastien M. R. Arnold
Pierre-Antoine Manzagol
Reza Babanezhad Harikandeh
Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing pas… (voir plus)t gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.
Reinforcement Learning for Sustainable Agriculture
Jonathan Binas
Leonie H. Luginbuehl
Modern machine learning methods have achieved superhuman performance on a variety of tasks, simply learning from the outcomes of their actio… (voir plus)ns. We propose a path towards more sustainable agriculture, considering plant development an optimization problem with respect to certain parameters, such as yield and environmental impact, which can be optimized in an automated way. Specifically, we propose to use reinforcement learning to autonomously explore and learn ways of influencing the development of certain types of plants, controlling environmental parameters, such as irrigation or nutrient supply, and receiving sensory feedback, such as camera images, humidity, and moisture measurements. The trained system will thus be able to provide instructions for optimal treatment of a local population of plants, based on non-invasive measurements, such as imaging.
La science, un droit humain. Mettre en œuvre le principe d’une science participative, équitable et accessible à tous
Systematic Generalization: What Is Required and Can It Be Learned?
Dzmitry Bahdanau*
Shikhar Murty*
Michael Noukhovitch
Thien Huu Nguyen
Harm de Vries
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
Stanisław Jastrzębski
Zac Kenton
Nicolas Ballas
Asja Fischer
Amos Storkey
Stochastic Gradient Descent (SGD) based training of neural networks with a large learning rate or a small batch-size typically ends in well-… (voir plus)generalizing, flat regions of the weight space, as indicated by small eigenvalues of the Hessian of the training loss. However, the curvature along the SGD trajectory is poorly understood. An empirical investigation shows that initially SGD visits increasingly sharp regions, reaching a maximum sharpness determined by both the learning rate and the batch-size of SGD. When studying the SGD dynamics in relation to the sharpest directions in this initial phase, we find that the SGD step is large compared to the curvature and commonly fails to minimize the loss along the sharpest directions. Furthermore, using a reduced learning rate along these directions can improve training speed while leading to both sharper and better generalizing solutions compared to vanilla SGD. In summary, our analysis of the dynamics of SGD in the subspace of the sharpest directions shows that they influence the regions that SGD steers to (where larger learning rate or smaller batch size result in wider regions visited), the overall training speed, and the generalization ability of the final model.
The Termination Critic
Anna Harutyunyan
Will Dabney
Diana Borsa
Nicolas Heess
Remi Munos
In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We… (voir plus) propose an algorithm that focuses on the termination function, as opposed to - as is common - the policy. The termination function is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding - arguably a key reason for using abstractions.To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a “critic” for the termination function. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning.
Toward Requirements Specification for Machine-Learned Components
Mona Rahimi
Sahar Kokaly
Marsha Chechik
In current practice, the behavior of Machine-Learned Components (MLCs) is not sufficiently specified by the predefined requirements. Instead… (voir plus), they "learn" existing patterns from the available training data, and make predictions for unseen data when deployed. On the surface, their ability to extract patterns and to behave accordingly is specifically useful for hard-to-specify concepts in certain safety critical domains (e.g., the definition of a pedestrian in a pedestrian detection component in a vehicle). However, the lack of requirements specifications on their behaviors makes further software engineering tasks challenging for such components. This is especially concerning for tasks such as safety assessment and assurance. In this position paper, we call for more attention from the requirements engineering community on supporting the specification of requirements for MLCs in safety critical domains. Towards that end, we propose an approach to improve the process of requirements specification in which an MLC is developed and operates by explicitly specifying domain-related concepts. Our approach extracts a universally accepted benchmark for hard-to-specify concepts (e.g., "pedestrian") and can be used to identify gaps in the associated dataset and the constructed machine-learned model.
Towards Jumpy Planning
Akilesh
Suriya Singh
Anirudh Goyal
Alexander Neitz
Model-free reinforcement learning (RL) is a powerful paradigm for learning complex tasks but suffers from high sample inefficiency as well a… (voir plus)s ignorance of the environment dynamics. On the other hand, a model-based RL agent learns dynamical causal models of the environment and uses them to plan. However, using a model at the scale of time-steps (usually tens of milliseconds) is mostly unfeasible in practice due to compounding prediction errors and computational requirements for making vast numbers of model queries during the planning process. We propose to use a modelbased planner together with a goal-conditioned policy trained with model-free learning. We use a model-based planner that operates at higher levels of abstraction i.e., decision states and use modelfree RL between the decision states. We validate our approach in terms of transfer and generalization performance and show that it leads to improvement over model-based planner that jumps to states that are fixed timesteps ahead.
Towards Jumpy Planning
Akilesh
Suriya Singh
Anirudh Goyal
Alexander Neitz