Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment
M. Jorge Cardoso
Fei Gao
BERNHARD KAINZ
T. Walsum
Kuangyu Shi
Kanwal K. Bhatia
R. Peter
Tom Kamiel Magda Vercauteren
Mauricio Reyes
Adrian Dalca
Roland Wiest
Wiro Niessen
B. Emmer
Multitask Spectral Learning of Weighted Automata
We consider the problem of estimating multiple related functions computed by weighted automata~(WFA). We first present a natural notion of r… (see more)elatedness between WFAs by considering to which extent several WFAs can share a common underlying representation. We then introduce the model of vector-valued WFA which conveniently helps us formalize this notion of relatedness. Finally, we propose a spectral learning algorithm for vector-valued WFAs to tackle the multitask learning problem. By jointly learning multiple tasks in the form of a vector-valued WFA, our algorithm enforces the discovery of a representation space shared between tasks. The benefits of the proposed multitask approach are theoretically motivated and showcased through experiments on both synthetic and real world datasets.
Piecewise Latent Variables for Neural Variational Text Processing
Iulian V. Serban
Alexander G. Ororbia II
Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variable… (see more)s, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.
PROCLIVITY PATTERNS IN ATTRIBUTED GRAPHS
Dhivya Eswaran
Christos Faloutsos
Artur Dubrawski
Many real world applications include information on both attributes of individual entities as well as relations between them, while there ex… (see more)ists an interplay between these attributes and relations. For example, in a typical social network, the similarity of individuals’ characteristics motivates them to form relations, a.k.a. social selection; whereas the characteristics of individuals may be affected by the characteristics of their relations, a.k.a. social influence. We can measure proclivity in networks by quantifying the correlation of nodal attributes and the structure [1]. Here, we are interested in a more fundamental study, to extend the basic statistics defined for graphs and draw parallels for the attributed graphs. More formally, an attributed graph is denoted by (A,X); where An×n is the adjacency matrix and encodes the relationships between the n nodes, and Xn×k is the attributes matrix –each row shows the feature vector of the corresponding node. Degree of a node encodes the number of its neighbors, computed as ki = ∑ j Aij . We can extend this notion to networks with binary attributes to the number of neighbors which share a particular attribute x, i.e. ki(x) = ∑ j Aijδ(Xj , x); where δ(Xj , x) = 1 iff node j has attribute x. Similar to the simple graphs, where the degree distribution is studied and showed to be heavy tail, here we can look at: 1) the degree distributions per attribute, 2) the joint probability distribution of any pair of attributes. Moreover, if we assume A(x1, x2) is the induced subgraph (or masked matrix of edges) with endpoints of values (x1, x2), i.e., A(x1, x2) = Aijδ(Xi, x1)δ(Xj , x2), then we can study and compare these distributions for the induced subgraph per each pair of attribute values. For example, Figure 1 shows the same general trend in the distribution of the original graph and the three possible induced subgraph.
Sequentialized Sampling Importance Resampling and Scalable IWAE
Chin-Wei Huang
We propose a new sequential algorithm for Sampling Importance Resampling. The algorithm serves as a solution to expensive evaluation of impo… (see more)rtance weight, and can be interpreted as stochastically and iteratively refining the particles by correcting them towards the target distribution as pool size increases. We apply this algorithm to variational inference with Importance Weighted Lower Bound and propose a memory-scalable training procedure 1 that implicitly improves the variational proposal. 1 Sequentializing Sampling Importance Resampling 1.1 Sampling Importance Resampling Given an unnormalized target distribution p̃(x) and proposal distribution q(x), the Sampling Importance Resampling (SIR) proceeds as follows: 1. draw xi for 1 ≤ i ≤ n from q(x) 2. calculate the importance weight wi = p̃(xi) q(xi) 3. calculate the normalized importance weight w̄i = wi ∑ i wi 4. draw index variable yj ∼ mul(w̄1, ..., w̄n) for 1 ≤ j ≤ m The density of the set of resampled particles xy1 , ..., xym should resemble the pdf of the target distribution, and the new samples will be approximately distributed by p(x) (Bishop, 2007). On average, the samples can be improved by increasing the pool size n, and becomes corrected when n→∞. The procedure is visualized in Figure 1a. 1.2 SeqSIR The above procedure can be combined with the idea of reservoir sampling, so that we need not evaluate all n samples at the same time, which will be an issue when n is large or when evaluation of a sample (i.e. computation of wi) is expensive. The intuition is to keep a running sum of the importance weights while we evaluate the pool samples sequentially, and then decide to keep the old sample or replace it with the new one based on the ratio of the new sample’s importance weight to the running sum. This is what we call Sequentialized Sampling Importance Resampling (SEQSIR), which is summarized in Algorithm 1. See Figure 1b for illustration. Note that density and importance weight are computed on log scale to deal with numerical instability, and log-sum-exp operation (LSE) is used in place of addition to calculate the running sum of See https://github.com/CW-Huang/SeqIWAE for implementation. Second workshop on Bayesian Deep Learning (NIPS 2017), Long Beach, CA, USA. Algorithm 1 Sequentialized Sampling Importance Resampling and Stochastic Iterative Refinement procedure SEQSIR ( logp, logq . unnormalized target density function and proposal density function ss . n samples to be evaluated ) A←−∞ . initialize accumulated sum of importance weight on log scale s_old← 0 . initialize sample n← len([s1,...,sn]) for i=1,...,n do s_new = ss[i] A, s_old← STOCHREFINE(logp, logq, A, s_old, s_new) return s_old procedure STOCHREFINE ( logp, logq . unnormalized target density function and proposal density function A . accumulated sum of importance weight on log scale s_old, s_new . old and new samples ) w_new← logp(s_new) logq(s_new) A← LSE(A, w_new) u← unif(0,1) if w_new A >= log u then return A, s_new else return A, s_old
Diet Networks: Thin Parameters for Fat Genomic
pierre luc carrier
Akram Erraqabi
Tristan Sylvain
Alex Auvolat
Etienne Dejoie
M. Dubé
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (see more) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.
HeMIS: Hetero-Modal Image Segmentation
Mohammad Havaei
Nicolas Guizard
A Multisensor Multi-Bernoulli Filter
Augustin-Alexandru Saucan
In this paper, we derive a multisensor multi-Bernoulli (MS-MeMBer) filter for multitarget tracking. Measurements from multiple sensors are e… (see more)mployed by the proposed filter to update a set of tracks modeled as a multi-Bernoulli random finite set. An exact implementation of the MS-MeMBer update procedure is computationally intractable. We propose an efficient approximate implementation by using a greedy measurement partitioning mechanism. The proposed filter allows for Gaussian mixture or particle filter implementations. Numerical simulations conducted for both linear-Gaussian and nonlinear models highlight the improved accuracy of the MS-MeMBer filter and its reduced computational load with respect to the multisensor cardinalized probability hypothesis density filter and the iterated-corrector cardinality-balanced multi-Bernoulli filter especially for low probabilities of detection.
Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
Iulian V. Serban
Alberto García-Durán
Caglar Gulcehre
Sungjin Ahn
Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. Howeve… (see more)r, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions. The produced question answer pairs are evaluated both by human evaluators and using automatic evaluation metrics, including well-established machine translation and sentence similarity metrics. Across all evaluation criteria the question-generation model outperforms the competing template-based baseline. Furthermore, when presented to human evaluators, the generated questions appear comparable in quality to real human-generated questions.
HeMIS: Hetero-Modal Image Segmentation
Mohammad Havaei
Nicolas Guizard
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele