Improved Training of Wasserstein GANs
Ishaan Gulrajani
Faruk Ahmed
Martin Arjovsky
Vincent Dumoulin
Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserste… (see more)in GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.
Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis
M. Jorge Cardoso
Su-Lin Lee
Veronika Cheplygina
Simone Balocco
Diana Mateus
Guillaume Zahnd
Lena Maier-Hein
Stefanie Demirci
Eric Granger
Luc Duong
M. Carbonneau
Shadi N. Albarqouni
G. Carneiro
Modulating early visual processing by language
Harm de Vries
Florian Strub
Jérémie Mary
Olivier Pietquin
It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view do… (see more)minates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.
Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment
M. Cardoso
Fei Gao
BERNHARD KAINZ
T. Walsum
Kuangyu Shi
Kanwal K. Bhatia
R. Peter
Tom Kamiel Magda Vercauteren
Mauricio Reyes
Adrian Dalca
Roland Wiest
W. Niessen
B. Emmer
Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment
M. Jorge Cardoso
Fei Gao
BERNHARD KAINZ
T. Walsum
Kuangyu Shi
Kanwal K. Bhatia
R. Peter
Tom Kamiel Magda Vercauteren
Mauricio Reyes
Adrian Dalca
Roland Wiest
Wiro Niessen
B. Emmer
Multitask Spectral Learning of Weighted Automata
We consider the problem of estimating multiple related functions computed by weighted automata~(WFA). We first present a natural notion of r… (see more)elatedness between WFAs by considering to which extent several WFAs can share a common underlying representation. We then introduce the model of vector-valued WFA which conveniently helps us formalize this notion of relatedness. Finally, we propose a spectral learning algorithm for vector-valued WFAs to tackle the multitask learning problem. By jointly learning multiple tasks in the form of a vector-valued WFA, our algorithm enforces the discovery of a representation space shared between tasks. The benefits of the proposed multitask approach are theoretically motivated and showcased through experiments on both synthetic and real world datasets.
Piecewise Latent Variables for Neural Variational Text Processing
Iulian V. Serban
Alexander G. Ororbia II
Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variable… (see more)s, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.
PROCLIVITY PATTERNS IN ATTRIBUTED GRAPHS
Dhivya Eswaran
Christos Faloutsos
Artur Dubrawski
Many real world applications include information on both attributes of individual entities as well as relations between them, while there ex… (see more)ists an interplay between these attributes and relations. For example, in a typical social network, the similarity of individuals’ characteristics motivates them to form relations, a.k.a. social selection; whereas the characteristics of individuals may be affected by the characteristics of their relations, a.k.a. social influence. We can measure proclivity in networks by quantifying the correlation of nodal attributes and the structure [1]. Here, we are interested in a more fundamental study, to extend the basic statistics defined for graphs and draw parallels for the attributed graphs. More formally, an attributed graph is denoted by (A,X); where An×n is the adjacency matrix and encodes the relationships between the n nodes, and Xn×k is the attributes matrix –each row shows the feature vector of the corresponding node. Degree of a node encodes the number of its neighbors, computed as ki = ∑ j Aij . We can extend this notion to networks with binary attributes to the number of neighbors which share a particular attribute x, i.e. ki(x) = ∑ j Aijδ(Xj , x); where δ(Xj , x) = 1 iff node j has attribute x. Similar to the simple graphs, where the degree distribution is studied and showed to be heavy tail, here we can look at: 1) the degree distributions per attribute, 2) the joint probability distribution of any pair of attributes. Moreover, if we assume A(x1, x2) is the induced subgraph (or masked matrix of edges) with endpoints of values (x1, x2), i.e., A(x1, x2) = Aijδ(Xi, x1)δ(Xj , x2), then we can study and compare these distributions for the induced subgraph per each pair of attribute values. For example, Figure 1 shows the same general trend in the distribution of the original graph and the three possible induced subgraph.
Sequentialized Sampling Importance Resampling and Scalable IWAE
Chin-Wei Huang
We propose a new sequential algorithm for Sampling Importance Resampling. The algorithm serves as a solution to expensive evaluation of impo… (see more)rtance weight, and can be interpreted as stochastically and iteratively refining the particles by correcting them towards the target distribution as pool size increases. We apply this algorithm to variational inference with Importance Weighted Lower Bound and propose a memory-scalable training procedure 1 that implicitly improves the variational proposal. 1 Sequentializing Sampling Importance Resampling 1.1 Sampling Importance Resampling Given an unnormalized target distribution p̃(x) and proposal distribution q(x), the Sampling Importance Resampling (SIR) proceeds as follows: 1. draw xi for 1 ≤ i ≤ n from q(x) 2. calculate the importance weight wi = p̃(xi) q(xi) 3. calculate the normalized importance weight w̄i = wi ∑ i wi 4. draw index variable yj ∼ mul(w̄1, ..., w̄n) for 1 ≤ j ≤ m The density of the set of resampled particles xy1 , ..., xym should resemble the pdf of the target distribution, and the new samples will be approximately distributed by p(x) (Bishop, 2007). On average, the samples can be improved by increasing the pool size n, and becomes corrected when n→∞. The procedure is visualized in Figure 1a. 1.2 SeqSIR The above procedure can be combined with the idea of reservoir sampling, so that we need not evaluate all n samples at the same time, which will be an issue when n is large or when evaluation of a sample (i.e. computation of wi) is expensive. The intuition is to keep a running sum of the importance weights while we evaluate the pool samples sequentially, and then decide to keep the old sample or replace it with the new one based on the ratio of the new sample’s importance weight to the running sum. This is what we call Sequentialized Sampling Importance Resampling (SEQSIR), which is summarized in Algorithm 1. See Figure 1b for illustration. Note that density and importance weight are computed on log scale to deal with numerical instability, and log-sum-exp operation (LSE) is used in place of addition to calculate the running sum of See https://github.com/CW-Huang/SeqIWAE for implementation. Second workshop on Bayesian Deep Learning (NIPS 2017), Long Beach, CA, USA. Algorithm 1 Sequentialized Sampling Importance Resampling and Stochastic Iterative Refinement procedure SEQSIR ( logp, logq . unnormalized target density function and proposal density function ss . n samples to be evaluated ) A←−∞ . initialize accumulated sum of importance weight on log scale s_old← 0 . initialize sample n← len([s1,...,sn]) for i=1,...,n do s_new = ss[i] A, s_old← STOCHREFINE(logp, logq, A, s_old, s_new) return s_old procedure STOCHREFINE ( logp, logq . unnormalized target density function and proposal density function A . accumulated sum of importance weight on log scale s_old, s_new . old and new samples ) w_new← logp(s_new) logq(s_new) A← LSE(A, w_new) u← unif(0,1) if w_new A >= log u then return A, s_new else return A, s_old
Diet Networks: Thin Parameters for Fat Genomic
pierre luc carrier
Akram Erraqabi
Tristan Sylvain
Alex Auvolat
Etienne Dejoie
M. Dubé
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (see more) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.
HeMIS: Hetero-Modal Image Segmentation
Mohammad Havaei
Nicolas Guizard
A Multisensor Multi-Bernoulli Filter
Augustin-Alexandru Saucan
In this paper, we derive a multisensor multi-Bernoulli (MS-MeMBer) filter for multitarget tracking. Measurements from multiple sensors are e… (see more)mployed by the proposed filter to update a set of tracks modeled as a multi-Bernoulli random finite set. An exact implementation of the MS-MeMBer update procedure is computationally intractable. We propose an efficient approximate implementation by using a greedy measurement partitioning mechanism. The proposed filter allows for Gaussian mixture or particle filter implementations. Numerical simulations conducted for both linear-Gaussian and nonlinear models highlight the improved accuracy of the MS-MeMBer filter and its reduced computational load with respect to the multisensor cardinalized probability hypothesis density filter and the iterated-corrector cardinality-balanced multi-Bernoulli filter especially for low probabilities of detection.