Facilitating Multimodality in Normalizing Flows
The true Bayesian posterior of a model such as a neural network may be highly multimodal. In principle, normalizing flows can represent such… (see more) a distribution via compositions of invertible transformations of random noise. In practice, however, existing normalizing flows may fail to capture most of the modes of a distribution. We argue that the conditionally affine structure of the transformations used in [Dinh et al., 2014, 2016, Kingma et al., 2016] is inefficient, and show that flows which instead use (conditional) invertible non-linear transformations naturally enable multimodality in their output distributions. With just two layers of our proposed deep sigmoidal flow, we are able to model complicated 2d energy functions with much higher fidelity than six layers of deep affine flows.
Fetal, Infant and Ophthalmic Medical Image Analysis
M. Cardoso
Andrew Melbourne
Hrvoje Bogunovic
Pim Moeskops
Xinjian Chen
Ernst Schwartz
M. Garvin
E. Robinson
E. Trucco
Michael Ebner
Yanwu Xu
Antonios Makropoulos
Adrien Desjardin
Tom Kamiel Magda Vercauteren
Fetal, Infant and Ophthalmic Medical Image Analysis
M. Jorge Cardoso
Andrew Melbourne
Hrvoje Bogunovic
Pim Moeskops
Xinjian Chen
Ernst Schwartz
M. Garvin
E. Robinson
E. Trucco
Michael Ebner
Yanwu Xu
Antonios Makropoulos
Adrien Desjardin
Tom Kamiel Magda Vercauteren
GibbsNet: Iterative Adversarial Inference for Deep Graphical Models
Alex Lamb
Yaroslav Ganin
Joseph Paul Cohen
Directed latent variable models that formulate the joint distribution as …
Hierarchical Methods of Moments
Matteo Ruffini
Borja Balle
Spectral methods of moments provide a powerful tool for learning the parameters of latent variable models. Despite their theoretical appeal,… (see more) the applicability of these methods to real data is still limited due to a lack of robustness to model misspecification. In this paper we present a hierarchical approach to methods of moments to circumvent such limitations. Our method is based on replacing the tensor decomposition step used in previous algorithms with approximate joint diagonalization. Experiments on topic modeling show that our method outperforms previous tensor decomposition methods in terms of speed and model quality.
Improved Training of Wasserstein GANs
Ishaan Gulrajani
Faruk Ahmed
Martin Arjovsky
Vincent Dumoulin
Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserste… (see more)in GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.
Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis
M. Jorge Cardoso
Su-Lin Lee
Veronika Cheplygina
Simone Balocco
Diana Mateus
Guillaume Zahnd
Lena Maier-Hein
Stefanie Demirci
Eric Granger
Luc Duong
M. Carbonneau
Shadi N. Albarqouni
G. Carneiro
Modulating early visual processing by language
Harm de Vries
Florian Strub
Jérémie Mary
Olivier Pietquin
It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view do… (see more)minates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.
Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment
M. Cardoso
Fei Gao
BERNHARD KAINZ
T. Walsum
Kuangyu Shi
Kanwal K. Bhatia
R. Peter
Tom Kamiel Magda Vercauteren
Mauricio Reyes
Adrian Dalca
Roland Wiest
W. Niessen
B. Emmer
Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment
M. Jorge Cardoso
Fei Gao
BERNHARD KAINZ
T. Walsum
Kuangyu Shi
Kanwal K. Bhatia
R. Peter
Tom Kamiel Magda Vercauteren
Mauricio Reyes
Adrian Dalca
Roland Wiest
Wiro Niessen
B. Emmer
Multitask Spectral Learning of Weighted Automata
We consider the problem of estimating multiple related functions computed by weighted automata~(WFA). We first present a natural notion of r… (see more)elatedness between WFAs by considering to which extent several WFAs can share a common underlying representation. We then introduce the model of vector-valued WFA which conveniently helps us formalize this notion of relatedness. Finally, we propose a spectral learning algorithm for vector-valued WFAs to tackle the multitask learning problem. By jointly learning multiple tasks in the form of a vector-valued WFA, our algorithm enforces the discovery of a representation space shared between tasks. The benefits of the proposed multitask approach are theoretically motivated and showcased through experiments on both synthetic and real world datasets.
Piecewise Latent Variables for Neural Variational Text Processing
Iulian V. Serban
Alexander G. Ororbia II
Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variable… (see more)s, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.