Publications

Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models

Christian Dietrich Weilbach

Boyan Beronov

Frank Wood

William Harvey

We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing ﬂows (CNFs) for amortized i… (see more)nference. We ﬁnd that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the ﬂow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained ﬂows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.

2020-01-01

International Conference on Artificial Intelligence and Statistics (published)

Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models

Christian Dietrich Weilbach

Boyan Beronov

Frank Wood

William Harvey

We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing ﬂows (CNFs) for amortized i… (see more)nference. We ﬁnd that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the ﬂow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained ﬂows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.

2020-01-01

International Conference on Artificial Intelligence and Statistics (published)

Synbols: Probing Learning Algorithms with Synthetic Datasets

Alexandre Lacoste

Pau Rodr'iguez

Frédéric Branchaud-charron

Parmida Atighehchian

Massimo Caccia

Issam Hadj Laradji

Alexandre Drouin

Matt P. Craddock

Laurent Charlin

David Vazquez

Systematicity in a Recurrent Neural Network by Factorizing Syntax and Semantics

Jacob Russin

Jason Jo

R. O’Reilly

Yoshua Bengio

Standard methods in deep learning fail to capture compositional or systematic structure in their training data, as shown by their inability … (see more)to generalize outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. The inductive biases that might underlie this powerful cognitive capacity remain unclear. Inspired by work in cognitive science suggesting a functional distinction between systems for syntactic and semantic processing, we implement a modiﬁcation to an existing deep learning architecture, imposing an analogous separation. The resulting architecture substantially out-performs standard recurrent networks on the SCAN dataset, a compositional generalization task, without any additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure, and highlights the potential of using cognitive principles to inform inductive biases in deep learning.

2020-01-01

CogSci (published)

Tensorized Random Projections

Beheshteh T. Rakhshan

Guillaume Rabusseau

2020-01-01

AISTATS (published)

On the Effectiveness of Two-Step Learning for Latent-Variable Models

Cem Subakan

Maxime Gasse

Laurent Charlin

Latent-variable generative models offer a principled solution for modeling and sampling from complex probability distributions. Implementing… (see more) a joint training objective with a complex prior, however, can be a tedious task, as one is typically required to derive and code a specific cost function for each new type of prior distribution. In this work, we propose a general framework for learning latent variable generative models in a two-step fashion. In the first step of the framework, we train an autoencoder, and in the second step we fit a prior model on the resulting latent distribution. This two-step approach offers a convenient alternative to joint training, as it allows for a straightforward combination of existing models without the hustle of deriving new cost functions, and the need for coding the joint training objectives. Through a set of experiments, we demonstrate that two-step learning results in performances similar to joint training, and in some cases even results in more accurate modeling.

2020-01-01

2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP) (published)

doi.org

On the interplay between noise and curvature and its effect on optimization and generalization

Valentin Thomas

Fabian Pedregosa

Bart van Merriënboer

Pierre-Antoine Manzagol

Yoshua Bengio

Nicolas Le Roux

The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the v… (see more)ariance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.

2020-01-01

AISTATS (published)

On the Systematicity of Probing Contextualized Word Representations: The Case of Hypernymy in BERT.

Abhilasha Ravichander

Eduard Hovy

Kaheer Suleman

Adam Trischler

Jackie Cheung

2020-01-01

*SEM@COLING (published)

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

Anirudh Goyal

Yoshua Bengio

Matthew Botvinick

Sergey Levine

In many applications, it is desirable to extract only the relevant information from complex input data, which involves making a decision abo… (see more)ut which input features are relevant. The information bottleneck method formalizes this as an information-theoretic optimization problem by maintaining an optimal tradeoff between compression (throwing away irrelevant input information), and predicting the target. In many problem settings, including the reinforcement learning problems we consider in this work, we might prefer to compress only part of the input. This is typically the case when we have a standard conditioning input, such as a state observation, and a ``privileged'' input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent. In such cases, we might prefer to compress the privileged input, either to achieve better generalization (e.g., with respect to goals) or to minimize access to costly information (e.g., in the case of communication). Practical implementations of the information bottleneck based on variational inference require access to the privileged input in order to compute the bottleneck variable, so although they perform compression, this compression operation itself needs unrestricted, lossless access. In this work, we propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information before seeing it, i.e., only based on the standard input, and then accordingly chooses stochastically, whether to access the privileged input or not. We formulate a tractable approximation to this framework and demonstrate in a series of reinforcement learning experiments that it can improve generalization and reduce access to computationally costly information.

2020-01-01

ICLR (published)

openreview.net

A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games.

Waiss Azizian

Ioannis Mitliagkas

Simon Lacoste-Julien

Gauthier Gidel

2020-01-01

International Conference on Artificial Intelligence and Statistics (published)

A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games

Waiss Azizian

Ioannis Mitliagkas

Simon Lacoste-Julien

Gauthier Gidel

We consider diﬀerentiable games where the goal is to ﬁnd a Nash equilibrium. The machine learning community has recently started using v… (see more)ariants of the gradient method ( GD ). Prime examples are extragradient ( EG ), the optimistic gradient method ( OG ) and consensus optimization ( CO ), which enjoy linear convergence in cases like bilinear games, where the standard GD fails. The full bene-ﬁts of theses relatively new methods are not known as there is no uniﬁed analysis for both strongly monotone and bilinear games. We provide new analyses of the EG ’s local and global convergence properties and use is to get a tighter global convergence rate for OG and CO . Our analysis covers the whole range of settings between bilinear and strongly monotone games. It reveals that these methods converges via diﬀerent mechanisms at these extremes; in between, it exploits the most favorable mechanism for the given problem. We then prove that EG achieves the optimal rate for a wide class of algorithms with any number of extrapolations. Our tight analysis of EG ’s convergence rate in games shows that, unlike in convex minimization, EG may be much faster than GD .

2020-01-01

International Conference on Artificial Intelligence and Statistics (published)