Publications

Task dependent deep LDA pruning of neural networks

Qing Tian

Tal Arbel

James J. Clark

2020-11-23

Computer Vision and Image Understanding (publié)

doi.org

AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

Jae Hyun Lim

Aaron Courville

Christopher Pal

Chin-wei Huang

Entropy is ubiquitous in machine learning, but it is in general intractable to compute the entropy of the distribution of an arbitrary conti… (voir plus)nuous random variable. In this paper, we propose the amortized residual denoising autoencoder (AR-DAE) to approximate the gradient of the log density function, which can be used to estimate the gradient of entropy. Amortization allows us to significantly reduce the error of the gradient approximator by approaching asymptotic optimality of a regular DAE, in which case the estimation is in theory unbiased. We conduct theoretical and experimental analyses on the approximation error of the proposed method, as well as extensive studies on heuristics to ensure its robustness. Finally, using the proposed gradient approximator to estimate the gradient of entropy, we demonstrate state-of-the-art performance on density estimation with variational autoencoders and continuous control with soft actor-critic.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Countering Language Drift with Seeded Iterated Learning

Yuchen Lu

Soumye Singhal

Florian Strub

Olivier Pietquin

Aaron Courville

Pretraining on human corpus and then finetuning in a simulator has become a standard pipeline for training a goal-oriented dialogue agent. N… (voir plus)evertheless, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we propose a generic approach to counter language drift called Seeded iterated learning (SIL). We periodically refine a pretrained student agent by imitating data sampled from a newly generated teacher agent. At each time step, the teacher is created by copying the student agent, before being finetuned to maximize task completion. SIL does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We evaluate SIL in a toy-setting Lewis Game, and then scale it up to the translation game with natural language. In both settings, SIL helps counter language drift as well as it improves the task completion compared to baselines.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Interference and Generalization in Temporal Difference Learning

Emmanuel Bengio

Joelle Pineau

Doina Precup

We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product… (voir plus) of two different gradients, representing their alignment. This quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD(

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Invariant Causal Prediction for Block MDPS

Amy Zhang

Clare Lyle

Shagun Sodhani

Angelos Filos

Marta Kwiatkowska

Joelle Pineau

Yarin Gal

Doina Precup

Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. … (voir plus)In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Latent Variable Modelling with Hyperbolic Normalizing Flows

Avishek Joey Bose

Ariella Smofsky

Renjie Liao

Prakash Panangaden

William L. Hamilton

The choice of approximate posterior distributions plays a central role in stochastic variational inference (SVI). One effective solution is … (voir plus)the use of normalizing flows \cut{defined on Euclidean spaces} to construct flexible posterior distributions. However, one key limitation of existing normalizing flows is that they are restricted to the Euclidean space and are ill-equipped to model data with an underlying hierarchical structure. To address this fundamental limitation, we present the first extension of normalizing flows to hyperbolic spaces. We first elevate normalizing flows to hyperbolic spaces using coupling transforms defined on the tangent bundle, termed Tangent Coupling (

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Linear Lower Bounds and Conditioning of Differentiable Games

Recent successes of game-theoretic formulations in ML have caused a resurgence of research interest in differentiable games. Overwhelmingly,… (voir plus) that research focuses on methods and upper bounds on their speed of convergence. In this work, we approach the question of fundamental iteration complexity by providing lower bounds to complement the linear (i.e. geometric) upper bounds observed in the literature on a wide class of problems. We cast saddle-point and min-max problems as 2-player games. We leverage tools from single-objective convex optimisation to propose new linear lower bounds for convex-concave games. Notably, we give a linear lower bound for

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Perceptual Generative Autoencoders

Zijun Zhang

Ruixiang Zhang

Zongpeng Li

Yoshua Bengio

Liam Paull

Modern generative models are usually designed to match target distributions directly in the data space, where the intrinsic dimension of dat… (voir plus)a can be much lower than the ambient dimension. We argue that this discrepancy may contribute to the difficulties in training generative models. We therefore propose to map both the generated and target distributions to a latent space using the encoder of a standard autoencoder, and train the generator (or decoder) to match the target distribution in the latent space. Specifically, we enforce the consistency in both the data space and the latent space with theoretically justified data and latent reconstruction losses. The resulting generative model, which we call a perceptual generative autoencoder (PGA), is then trained with a maximum likelihood or variational autoencoder (VAE) objective. With maximum likelihood, PGAs generalize the idea of reversible generative models to unrestricted neural network architectures and arbitrary number of latent dimensions. When combined with VAEs, PGAs substantially improve over the baseline VAEs in terms of sample quality. Compared to other autoencoder-based generative models using simple priors, PGAs achieve state-of-the-art FID scores on CIFAR-10 and CelebA.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Reverse-engineering deep ReLU networks

David Rolnick

Konrad Paul Kording

It has been widely assumed that a neural network cannot be recovered from its outputs, as the network depends on its parameters in a highly … (voir plus)nonlinear way. Here, we prove that in fact it is often possible to identify the architecture, weights, and biases of an unknown deep ReLU network by observing only its output. Every ReLU network defines a piecewise linear function, where the boundaries between linear regions correspond to inputs for which some neuron in the network switches between inactive and active ReLU states. By dissecting the set of region boundaries into components associated with particular neurons, we show both theoretically and empirically that it is possible to recover the weights of neurons and their arrangement within the network, up to isomorphism.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

proceedings.mlr.press

Revisiting Fundamentals of Experience Replay

William Fedus

Prajit Ramachandran

Rishabh Agarwal

Yoshua Bengio

Hugo Larochelle

Mark Rowland

Will Dabney

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understa… (voir plus)nding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay -- greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counterintuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across a variety of deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Universal Equivariant Multilayer Perceptrons

Siamak Ravanbakhsh

Group invariant and equivariant Multilayer Perceptrons (MLP), also known as Equivariant Networks, have achieved remarkable success in learni… (voir plus)ng on a variety of data structures, such as sequences, images, sets, and graphs. Using tools from group theory, this paper proves the universality of a broad class of equivariant MLPs with a single hidden layer. In particular, it is shown that having a hidden layer on which the group acts regularly is sufficient for universal equivariance (invariance). A corollary is unconditional universality of equivariant MLPs for Abelian groups, such as CNNs with a single hidden layer. A second corollary is the universality of equivariant MLPs with a high-order hidden layer, where we give both group-agnostic bounds and means for calculating group-specific bounds on the order of hidden layer that guarantees universal equivariance (invariance).

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

What can I do here? A Theory of Affordances in Reinforcement Learning

Khimya Khetarpal

Zafarali Ahmed

Gheorghe Comanici

David Abel

Doina Precup

Reinforcement learning algorithms usually assume that all actions are always available to an agent. However, both people and animals underst… (voir plus)and the general link between the features of their environment and the actions that are feasible. Gibson (1977) coined the term "affordances" to describe the fact that certain states enable an agent to do certain actions, in the context of embodied agents. In this paper, we develop a theory of affordances for agents who learn and plan in Markov Decision Processes. Affordances play a dual role in this case. On one hand, they allow faster planning, by reducing the number of actions available in any given situation. On the other hand, they facilitate more efficient and precise learning of transition models from data, especially when such models require function approximation. We establish these properties through theoretical results as well as illustrative examples. We also propose an approach to learn affordances and use it to estimate transition models that are simpler and generalize better.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Publications

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Publications