Frank-Wolfe Splitting via Augmented Lagrangian Method
Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than mini… (see more)mizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets, and only require access to the oracle on the individual constraints. In this work, we develop and analyze the Frank-Wolfe Augmented Lagrangian (FW-AL) algorithm, a method for minimizing a smooth function over convex compact sets related by a "linear consistency" constraint that only requires access to a linear minimization oracle over the individual constraints. It is based on the Augmented Lagrangian Method (ALM), also known as Method of Multipliers, but unlike most existing splitting methods, it only requires access to linear (instead of quadratic) minimization oracles. We use recent advances in the analysis of Frank-Wolfe and the alternating direction method of multipliers algorithms to prove a sublinear convergence rate for FW-AL over general convex compact sets and a linear convergence rate for polytopes.
Fraternal Dropout
Konrad Żołna
Devansh Arpit
Dendi Suhubdy
Fraternal Dropout
Konrad Żołna
Devansh Arpit
Dendi Suhubdy
Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential pred… (see more)iction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A number of techniques have been proposed in literature to address this problem. In this paper we propose a simple technique called fraternal dropout that takes advantage of dropout to achieve this goal. Specifically, we propose to train two identical copies of an RNN (that share parameters) with different dropout masks while minimizing the difference between their (pre-softmax) predictions. In this way our regularization encourages the representations of RNNs to be invariant to dropout mask, thus being robust. We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout. We evaluate our model and achieve state-of-the-art results in sequence modeling tasks on two benchmark datasets - Penn Treebank and Wikitext-2. We also show that our approach leads to performance improvement by a significant margin in image captioning (Microsoft COCO) and semi-supervised (CIFAR-10) tasks.
Graph Attention Networks
Petar Veličković
Guillem Cucurull
Arantxa Casanova
Pietro Lio
A Hierarchical Neural Attention-based Text Classifier
Koustuv Sinha
Yue Dong
Derek Ruths
Deep neural networks have been displaying superior performance over traditional supervised classifiers in text classification. They learn to… (see more) extract useful features automatically when sufficient amount of data is presented. However, along with the growth in the number of documents comes the increase in the number of categories, which often results in poor performance of the multiclass classifiers. In this work, we use external knowledge in the form of topic category taxonomies to aide the classification by introducing a deep hierarchical neural attention-based classifier. Our model performs better than or comparable to state-of-the-art hierarchical models at significantly lower computational cost while maintaining high interpretability.
HoME: a Household Multimodal Environment
Simon Brodeur
Ethan Perez
Ankesh Anand
Florian Golemo
Luca Celotti
Florian Strub
Jean Rouat
We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction… (see more) with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.
How can we do better ? Pitfalls in biomedical challenge design and how to address them
Annika Reinke
Matthias Eisenmann
Sinan Onogur
Marko Stankovic
Patrick Scholz
Hrvoje Bogunovic
Andrew P. Bradley
Aaron
Carass
Carolin Feldmann
Alejandro F. Frangi
Peter M. Full
Bram Ginneken Van
Ginneken
Allan Hanbury
Katrin Honauer
Michal Kozubek
Adam Bennett
Landman … (see 22 more)
Keno März
Oskar Maier
Klaus Maier-Hein
Bjoern Menze
Henning Müller
Peter F. Neher
Wiro Niessen
NASIR RAJPOOT
Catherine Gregory
Sharp
Korsuk Sirinukunwattana
Stefanie Speidel
Christian Stock
Danail
Stoyanov
Abdel Aziz Taha
F. V. D. Sommen
Ching-Wei Wang
Marc-André Weber
Guoyan Zheng
Pierre Jannin
Lena Maier-Hein
Since the first MICCAI grand challenge was organized in 2007 [1], the impact of biomedical image analysis challenges on both the research fi… (see more)eld as well as on individual careers has been steadily growing. For example, the acceptance of a journal article today often depends on the performance of a new algorithm being assessed against the state-ofthe-art work on publicly available challenge datasets. Furthermore, the results are also important for the individuals scientific careers as well as the potential that algorithms can be translated into clinical practice. Yet, while the publication of papers in scientific journals and prestigious conferences, such as MICCAI, undergoes strict quality control, the design and organization of challenges do not. To investigate the effect of common practice, we have formed an international initiative dedicated to analyzing and improving a variety of aspects related to biomedical challenge design, execution and reporting [2]. In the first part of our abstract presentation at LABELS workshop, we are going to present some of the major pitfalls related to biomedical image analysis challenges today. Specifically, we will look at the following research questions: RQ1: How robust are challenge rankings? What is the effect of – the specific test cases used? – the specific metric variant(s) applied? – the rank aggregation method chosen (e.g. aggregation of metric values with the mean vs median)? ? Shared first/senior authors.
Image-to-image translation for cross-domain disentanglement
Abel Gonzalez-Garcia
Joost van de Weijer
Deep image translation methods have recently shown excellent results, outputting high-quality images covering multiple modes of the data dis… (see more)tribution. There has also been increased interest in disentangling the internal representations learned by deep methods to further improve their performance and achieve a finer control. In this paper, we bridge these two objectives and introduce the concept of cross-domain disentanglement. We aim to separate the internal representation into three parts. The shared part contains information for both domains. The exclusive parts, on the other hand, contain only factors of variation that are particular to each domain. We achieve this through bidirectional image translation based on Generative Adversarial Networks and cross-domain autoencoders, a novel network component. Our model offers multiple advantages. We can output diverse samples covering multiple modes of the distributions of both domains, perform domain-specific image transfer and interpolation, and cross-domain retrieval without the need of labeled data, only paired images. We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets.
Improving Explorability in Variational Inference with Annealed Variational Objectives
Chin-Wei Huang
Shawn Tan
Alexandre Lacoste
Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can s… (see more)till limit the density that is ultimately learned. We demonstrate the drawbacks of biasing the true posterior to be unimodal, and introduce Annealed Variational Objectives (AVO) into the training of hierarchical variational methods. Inspired by Annealed Importance Sampling, the proposed method facilitates learning by incorporating energy tempering into the optimization objective. In our experiments, we demonstrate our method's robustness to deterministic warm up, and the benefits of encouraging exploration in the latent space.
Investigating the viability of Generative Models for Novelty Detection
Vidhi Jain
Abstract
LATTER M INIMA WITH SGD
Stanisław Jastrzębski
Zac Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Amos Storkey
It has been discussed that over-parameterized deep neural networks (DNNs) trained using stochastic gradient descent (SGD) with smaller batch… (see more) sizes generalize better compared with those trained with larger batch sizes. Additionally, model parameters found by small batch size SGD tend to be in flatter regions. We extend these empirical observations and experimentally show that both large learning rate and small batch size contribute towards SGD finding flatter minima that generalize well. Conversely, we find that small learning rates and large batch sizes lead to sharper minima that correlate with poor generalization in DNNs.
LATTER M INIMA WITH SGD
Stanisław Jastrzębski
Zac Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Amos Storkey