Frank Wood

Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models

Christian Dietrich Weilbach

Boyan Beronov

William Harvey

We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing ﬂows (CNFs) for amortized i… (see more)nference. We ﬁnd that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the ﬂow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained ﬂows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.

2020-01-01

International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training

William Harvey

Michael Teng

Frank Wood

Hard visual attention is a promising approach to reduce the computational burden of modern computer vision methodologies. However, hard atte… (see more)ntion mechanisms can be difficult and slow to train, which is especially costly for applications like neural architecture search where multiple networks must be trained. We introduce a method to amortise the cost of training by generating an extra supervision signal for a subset of the training data. This supervision is in the form of sequences of ‘good’ locations to attend to for each image. We find that the best method to generate supervision sequences comes from framing hard attention for image classification as a Bayesian optimal experimental design (BOED) problem. From this perspective, the optimal locations to attend to are those which provide the greatest expected reduction in the entropy of the classification distribution. We introduce methodology from the BOED literature to approximate this optimal behaviour and generate ‘near-optimal’ supervision sequences. We then present a hard attention network training objective that makes use of these sequences and show that it allows faster training than prior work. We finally demonstrate the utility of faster hard attention training by incorporating supervision sequences in a neural architecture search, resulting in hard attention architectures which can outperform networks with access to the entire image.

2019-06-13

ArXiv (preprint)

doi.org

arxiv.org

LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models

Yuanshuo Zhou

Bradley Gram-Hansen

Tobias Kohn

Tom Rainforth

Hongseok Yang

Frank Wood

We develop a new Low-level, First-order Probabilistic Programming Language~(LF-PPL) suited for models containing a mix of continuous, discre… (see more)te, and/or piecewise-continuous variables. The key success of this language and its compilation scheme is in its ability to automatically distinguish parameters the density function is discontinuous with respect to, while further providing runtime checks for boundary crossings. This enables the introduction of new inference engines that are able to exploit gradient information, while remaining efficient for models which are not everywhere differentiable. We demonstrate this ability by incorporating a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this language has a density with measure zero discontinuities to maintain the validity of the inference engine.

2019-03-06

ArXiv (preprint)

arxiv.org

LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models

Yuanshuo Zhou

Bradley Gram-Hansen

Tobias Kohn

Tom Rainforth

Hongseok Yang

Frank Wood

We develop a new Low-level, First-order Probabilistic Programming Language~(LF-PPL) suited for models containing a mix of continuous, discre… (see more)te, and/or piecewise-continuous variables. The key success of this language and its compilation scheme is in its ability to automatically distinguish parameters the density function is discontinuous with respect to, while further providing runtime checks for boundary crossings. This enables the introduction of new inference engines that are able to exploit gradient information, while remaining efficient for models which are not everywhere differentiable. We demonstrate this ability by incorporating a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this language has a density with measure zero discontinuities to maintain the validity of the inference engine.

2019-01-01

AISTATS (published)

proceedings.mlr.press

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Biography

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Frank Wood

Biography

Publications