Publications

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Anirudh Goyal

Philemon Brakel

William Fedus

Soumye Singhal

Timothy P. Lillicrap

Sergey Levine

Hugo Larochelle

Yoshua Bengio

In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provid… (voir plus)e a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on- and off-policy RL algorithms across several environments and tasks.

2019-01-01

ICLR.cc/2019/Conference (poster)

openreview.net

Reducing the variance in online optimization by transporting past gradients

Sébastien M. R. Arnold

Pierre-Antoine Manzagol

Reza Babanezhad Harikandeh

Ioannis Mitliagkas

Nicolas Le Roux

Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing pas… (voir plus)t gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.

arxiv.org

Reinforcement Learning for Sustainable Agriculture

Jonathan Binas

Leonie H. Luginbuehl

Yoshua Bengio

Modern machine learning methods have achieved superhuman performance on a variety of tasks, simply learning from the outcomes of their actio… (voir plus)ns. We propose a path towards more sustainable agriculture, considering plant development an optimization problem with respect to certain parameters, such as yield and environmental impact, which can be optimized in an automated way. Specifically, we propose to use reinforcement learning to autonomously explore and learn ways of influencing the development of certain types of plants, controlling environmental parameters, such as irrigation or nutrient supply, and receiving sensory feedback, such as camera images, humidity, and moisture measurements. The trained system will thus be able to provide instructions for optimal treatment of a local population of plants, based on non-invasive measurements, such as imaging.

La science, un droit humain. Mettre en œuvre le principe d’une science participative, équitable et accessible à tous

Catherine Régis

Systematic Generalization: What Is Required and Can It Be Learned?

Dzmitry Bahdanau*

Shikhar Murty*

Michael Noukhovitch

Thien Huu Nguyen

Harm de Vries

Aaron Courville

2019-01-01

ICLR.cc/2019/Conference (poster)

openreview.net

On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length

Stanisław Jastrzębski

Zac Kenton

Nicolas Ballas

Asja Fischer

Yoshua Bengio

Amos Storkey

Stochastic Gradient Descent (SGD) based training of neural networks with a large learning rate or a small batch-size typically ends in well-… (voir plus)generalizing, flat regions of the weight space, as indicated by small eigenvalues of the Hessian of the training loss. However, the curvature along the SGD trajectory is poorly understood. An empirical investigation shows that initially SGD visits increasingly sharp regions, reaching a maximum sharpness determined by both the learning rate and the batch-size of SGD. When studying the SGD dynamics in relation to the sharpest directions in this initial phase, we find that the SGD step is large compared to the curvature and commonly fails to minimize the loss along the sharpest directions. Furthermore, using a reduced learning rate along these directions can improve training speed while leading to both sharper and better generalizing solutions compared to vanilla SGD. In summary, our analysis of the dynamics of SGD in the subspace of the sharpest directions shows that they influence the regions that SGD steers to (where larger learning rate or smaller batch size result in wider regions visited), the overall training speed, and the generalization ability of the final model.

2019-01-01

ICLR.cc/2019/Conference (poster)

openreview.net

The Termination Critic

Anna Harutyunyan

Will Dabney

Diana Borsa

Nicolas Heess

Remi Munos

Doina Precup

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We… (voir plus) propose an algorithm that focuses on the termination function, as opposed to - as is common - the policy. The termination function is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding - arguably a key reason for using abstractions.To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a “critic” for the termination function. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning.

2019-01-01

AISTATS (publié)

arxiv.org

Toward Requirements Specification for Machine-Learned Components

Mona Rahimi

Jin Guo

Sahar Kokaly

Marsha Chechik

In current practice, the behavior of Machine-Learned Components (MLCs) is not sufficiently specified by the predefined requirements. Instead… (voir plus), they "learn" existing patterns from the available training data, and make predictions for unseen data when deployed. On the surface, their ability to extract patterns and to behave accordingly is specifically useful for hard-to-specify concepts in certain safety critical domains (e.g., the definition of a pedestrian in a pedestrian detection component in a vehicle). However, the lack of requirements specifications on their behaviors makes further software engineering tasks challenging for such components. This is especially concerning for tasks such as safety assessment and assurance. In this position paper, we call for more attention from the requirements engineering community on supporting the specification of requirements for MLCs in safety critical domains. Towards that end, we propose an approach to improve the process of requirements specification in which an MLC is developed and operates by explicitly specifying domain-related concepts. Our approach extracts a universally accepted benchmark for hard-to-specify concepts (e.g., "pedestrian") and can be used to identify gaps in the associated dataset and the constructed machine-learned model.

2019-01-01

2019 IEEE 27th International Requirements Engineering Conference Workshops (REW) (publié)

doi.org

Towards Jumpy Planning

Akilesh

Suriya Singh

Anirudh Goyal

Alexander Neitz

Aaron Courville

Model-free reinforcement learning (RL) is a powerful paradigm for learning complex tasks but suffers from high sample inefficiency as well a… (voir plus)s ignorance of the environment dynamics. On the other hand, a model-based RL agent learns dynamical causal models of the environment and uses them to plan. However, using a model at the scale of time-steps (usually tens of milliseconds) is mostly unfeasible in practice due to compounding prediction errors and computational requirements for making vast numbers of model queries during the planning process. We propose to use a modelbased planner together with a goal-conditioned policy trained with model-free learning. We use a model-based planner that operates at higher levels of abstraction i.e., decision states and use modelfree RL between the decision states. We validate our approach in terms of transfer and generalization performance and show that it leads to improvement over model-based planner that jumps to states that are fixed timesteps ahead.

2019-01-01

(publié)

www.semanticscholar.org

Towards Jumpy Planning

Akilesh

Suriya Singh

Anirudh Goyal

Alexander Neitz

Aaron Courville

Unsupervised State Representation Learning in Atari

Ankesh Anand

Evan Racah

Sherjil Ozair

Yoshua Bengio

Marc-Alexandre Côté

(Rex) Devon Hjelm

State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent ag… (voir plus)ents that can perform a wide variety of tasks. Learning such representations without supervision from rewards is a challenging open problem. We introduce a method that learns state representations by maximizing mutual information across spatially and temporally distinct features of a neural encoder of the observations. We also introduce a new benchmark based on Atari 2600 games where we evaluate representations based on how well they capture the ground truth state variables. We believe this new framework for evaluating representation learning models will be crucial for future representation learning research. Finally, we compare our technique with other state-of-the-art generative and contrastive representation learning methods. The code associated with this work is available at this https URL

arxiv.org

Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input

Maxence Ernoult

Julie Grollier

Damien Querlioz

Yoshua Bengio

Benjamin Scellier

Equilibrium Propagation (EP) is a biologically inspired learning algorithm for convergent recurrent neural networks, i.e. RNNs that are fed … (voir plus)by a static input x and settle to a steady state. Training convergent RNNs consists in adjusting the weights until the steady state of output neurons coincides with a target y. Convergent RNNs can also be trained with the more conventional Backpropagation Through Time (BPTT) algorithm. In its original formulation EP was described in the case of real-time neuronal dynamics, which is computationally costly. In this work, we introduce a discrete-time version of EP with simplified equations and with reduced simulation time, bringing EP closer to practical machine learning tasks. We first prove theoretically, as well as numerically that the neural and weight updates of EP, computed by forward-time dynamics, are step-by-step equal to the ones obtained by BPTT, with gradients computed backward in time. The equality is strict when the transition function of the dynamics derives from a primitive function and the steady state is maintained long enough. We then show for more standard discrete-time neural network dynamics that the same property is approximately respected and we subsequently demonstrate training with EP with equivalent performance to BPTT. In particular, we define the first convolutional architecture trained with EP achieving ~ 1% test error on MNIST, which is the lowest error reported with EP. These results can guide the development of deep neural networks trained with EP.

arxiv.org

À la hauteur du moment

Perspectives sur l’IA pour les responsables des politiques

Mila Techaide 2025

Développement du groupe d'experts de l'ONU sur l'IA

Transition à la direction scientifique de Mila

À la hauteur du moment

Perspectives sur l’IA pour les responsables des politiques

Publications

À la hauteur du moment

Perspectives sur l’IA pour les responsables des politiques

Mila Techaide 2025

Développement du groupe d'experts de l'ONU sur l'IA

Transition à la direction scientifique de Mila

À la hauteur du moment

Perspectives sur l’IA pour les responsables des politiques

Mots-clés populaires:

Publications