Publications

Improved Training of Wasserstein GANs

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserste… (see more)in GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

2016-12-31

Advances in Neural Information Processing Systems 30 (NIPS 2017) (published)

arxiv.org

Independently Controllable Factors

Valentin Thomas

Jules Pondard

Emmanuel Bengio

Marc Sarfati

Philippe Beaudoin

Marie-Jean Meurs

Joelle Pineau

Doina Precup

Yoshua Bengio

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it rema… (see more)ins an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.

2016-12-31

arXiv (preprint)

doi.org

arxiv.org

Independently Controllable Features

Valentin Thomas

Philippe Beaudoin

Y. Bengio

Marie-Jean Meurs

2016-12-31

(published)

doi.org

arxiv.org

Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis

M. Jorge Cardoso

Tal Arbel

Su-Lin Lee

Veronika Cheplygina

Simone Balocco

Diana Mateus

Guillaume Zahnd

Lena Maier-Hein

Stefanie Demirci

Eric Granger

Luc Duong

M. Carbonneau

Shadi N. Albarqouni

G. Carneiro

2016-12-31

CVII-STENT/LABELS@MICCAI (published)

doi.org

Modulating early visual processing by language

Jérémie Mary

Olivier Pietquin

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view do… (see more)minates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.

2016-12-31

Advances in Neural Information Processing Systems 30 (NIPS 2017) (published)

arxiv.org

Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment

M. Jorge Cardoso

Tal Arbel

Fei Gao

BERNHARD KAINZ

T. Walsum

Kuangyu Shi

Kanwal K. Bhatia

R. Peter

Tom Kamiel Magda Vercauteren

Mauricio Reyes

Adrian Dalca

Roland Wiest

Wiro Niessen

B. Emmer

2016-12-31

CMMI/RAMBO/SWITCH@MICCAI (published)

doi.org

Multitask Spectral Learning of Weighted Automata.

Guillaume Rabusseau

Borja Balle

Joelle Pineau

2016-12-31

Neural Information Processing Systems (unknown)

dblp.uni-trier.de

Piecewise Latent Variables for Neural Variational Text Processing

Iulian V. Serban

Alexander G. Ororbia II

Joelle Pineau

Aaron Courville

Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variable… (see more)s, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.

2016-12-31

Conference on Empirical Methods in Natural Language Processing (published)

doi.org

arxiv.org

PixelVAE: A Latent Variable Model for Natural Images

Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representatio… (see more)n and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64 × 64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.

2016-12-31

ICLR (Poster) (published)

openreview.net

PROCLIVITY PATTERNS IN ATTRIBUTED GRAPHS

Reihaneh Rabbany

Dhivya Eswaran

Christos Faloutsos

Artur Dubrawski

Many real world applications include information on both attributes of individual entities as well as relations between them, while there ex… (see more)ists an interplay between these attributes and relations. For example, in a typical social network, the similarity of individuals’ characteristics motivates them to form relations, a.k.a. social selection; whereas the characteristics of individuals may be affected by the characteristics of their relations, a.k.a. social influence. We can measure proclivity in networks by quantifying the correlation of nodal attributes and the structure [1]. Here, we are interested in a more fundamental study, to extend the basic statistics defined for graphs and draw parallels for the attributed graphs. More formally, an attributed graph is denoted by (A,X); where An×n is the adjacency matrix and encodes the relationships between the n nodes, and Xn×k is the attributes matrix –each row shows the feature vector of the corresponding node. Degree of a node encodes the number of its neighbors, computed as ki = ∑ j Aij . We can extend this notion to networks with binary attributes to the number of neighbors which share a particular attribute x, i.e. ki(x) = ∑ j Aijδ(Xj , x); where δ(Xj , x) = 1 iff node j has attribute x. Similar to the simple graphs, where the degree distribution is studied and showed to be heavy tail, here we can look at: 1) the degree distributions per attribute, 2) the joint probability distribution of any pair of attributes. Moreover, if we assume A(x1, x2) is the induced subgraph (or masked matrix of edges) with endpoints of values (x1, x2), i.e., A(x1, x2) = Aijδ(Xi, x1)δ(Xj , x2), then we can study and compare these distributions for the induced subgraph per each pair of attribute values. For example, Figure 1 shows the same general trend in the distribution of the original graph and the three possible induced subgraph.

2016-12-31

(published)

www.semanticscholar.org

Recurrent Batch Normalization

We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works… (see more) only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.

2016-12-31

ICLR.cc/2017/conference (poster)

doi.org

openreview.net

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Jose Sotelo

In this paper we propose a novel model for unconditional audio generation task that generates one audio sample at a time. We show that our m… (see more)odel which profits from combining memory-less modules, namely autoregressive multilayer perceptron, and stateful recurrent neural networks in a hierarchical structure is de facto powerful to capture the underlying sources of variations in temporal domain for very long time on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.

2016-12-31

ICLR.cc/2017/conference (poster)

openreview.net

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications