Publications

Frank-Wolfe Splitting via Augmented Lagrangian Method

Fabian Pedregosa

Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than mini… (see more)mizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets, and only require access to the oracle on the individual constraints. In this work, we develop and analyze the Frank-Wolfe Augmented Lagrangian (FW-AL) algorithm, a method for minimizing a smooth function over convex compact sets related by a "linear consistency" constraint that only requires access to a linear minimization oracle over the individual constraints. It is based on the Augmented Lagrangian Method (ALM), also known as Method of Multipliers, but unlike most existing splitting methods, it only requires access to linear (instead of quadratic) minimization oracles. We use recent advances in the analysis of Frank-Wolfe and the alternating direction method of multipliers algorithms to prove a sublinear convergence rate for FW-AL over general convex compact sets and a linear convergence rate for polytopes.

2018-01-01

AISTATS (published)

proceedings.mlr.press

arxiv.org

Fraternal Dropout

Konrad Żołna

Devansh Arpit

Dendi Suhubdy

Yoshua Bengio

2018-01-01

ICLR.cc/2018/Conference (poster)

openreview.net

Fraternal Dropout

Konrad Żołna

Devansh Arpit

Dendi Suhubdy

Yoshua Bengio

Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential pred… (see more)iction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A number of techniques have been proposed in literature to address this problem. In this paper we propose a simple technique called fraternal dropout that takes advantage of dropout to achieve this goal. Specifically, we propose to train two identical copies of an RNN (that share parameters) with different dropout masks while minimizing the difference between their (pre-softmax) predictions. In this way our regularization encourages the representations of RNNs to be invariant to dropout mask, thus being robust. We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout. We evaluate our model and achieve state-of-the-art results in sequence modeling tasks on two benchmark datasets - Penn Treebank and Wikitext-2. We also show that our approach leads to performance improvement by a significant margin in image captioning (Microsoft COCO) and semi-supervised (CIFAR-10) tasks.

2018-01-01

ICLR (Poster) (published)

arxiv.org

Graph Attention Networks

Petar Veličković

Guillem Cucurull

Arantxa Casanova

Adriana Romero Soriano

Pietro Lio

Yoshua Bengio

2018-01-01

ICLR.cc/2018/Conference (poster)

openreview.net

A Hierarchical Neural Attention-based Text Classifier

Koustuv Sinha

Yue Dong

Jackie Cheung

Derek Ruths

Deep neural networks have been displaying superior performance over traditional supervised classifiers in text classification. They learn to… (see more) extract useful features automatically when sufficient amount of data is presented. However, along with the growth in the number of documents comes the increase in the number of categories, which often results in poor performance of the multiclass classifiers. In this work, we use external knowledge in the form of topic category taxonomies to aide the classification by introducing a deep hierarchical neural attention-based classifier. Our model performs better than or comparable to state-of-the-art hierarchical models at significantly lower computational cost while maintaining high interpretability.

2018-01-01

Conference on Empirical Methods in Natural Language Processing (published)

doi.org

HoME: a Household Multimodal Environment

Simon Brodeur

Ethan Perez

Ankesh Anand

Florian Golemo

Luca Celotti

Florian Strub

Jean Rouat

Hugo Larochelle

Aaron Courville

We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction… (see more) with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.

2018-01-01

ICLR (Workshop) (published)

openreview.net

How can we do better ? Pitfalls in biomedical challenge design and how to address them

Annika Reinke

Matthias Eisenmann

Sinan Onogur

Marko Stankovic

Patrick Scholz

Tal Arbel

Hrvoje Bogunovic

Andrew P. Bradley

Aaron

Carass

Carolin Feldmann

Alejandro F. Frangi

Peter M. Full

Bram Ginneken Van

Ginneken

Allan Hanbury

Katrin Honauer

Michal Kozubek

Adam Bennett

Landman … (see 22 more)

Keno März

Oskar Maier

Klaus Maier-Hein

Bjoern Menze

Henning Müller

Peter F. Neher

Wiro Niessen

NASIR RAJPOOT

Catherine Gregory

Sharp

Korsuk Sirinukunwattana

Stefanie Speidel

Christian Stock

Danail

Stoyanov

Abdel Aziz Taha

F. V. D. Sommen

Ching-Wei Wang

Marc-André Weber

Guoyan Zheng

Pierre Jannin

Lena Maier-Hein

Since the first MICCAI grand challenge was organized in 2007 [1], the impact of biomedical image analysis challenges on both the research fi… (see more)eld as well as on individual careers has been steadily growing. For example, the acceptance of a journal article today often depends on the performance of a new algorithm being assessed against the state-ofthe-art work on publicly available challenge datasets. Furthermore, the results are also important for the individuals scientific careers as well as the potential that algorithms can be translated into clinical practice. Yet, while the publication of papers in scientific journals and prestigious conferences, such as MICCAI, undergoes strict quality control, the design and organization of challenges do not. To investigate the effect of common practice, we have formed an international initiative dedicated to analyzing and improving a variety of aspects related to biomedical challenge design, execution and reporting [2]. In the first part of our abstract presentation at LABELS workshop, we are going to present some of the major pitfalls related to biomedical image analysis challenges today. Specifically, we will look at the following research questions: RQ1: How robust are challenge rankings? What is the effect of – the specific test cases used? – the specific metric variant(s) applied? – the rank aggregation method chosen (e.g. aggregation of metric values with the mean vs median)? ? Shared first/senior authors.

Image-to-image translation for cross-domain disentanglement

Abel Gonzalez-Garcia

Joost van de Weijer

Yoshua Bengio

Deep image translation methods have recently shown excellent results, outputting high-quality images covering multiple modes of the data dis… (see more)tribution. There has also been increased interest in disentangling the internal representations learned by deep methods to further improve their performance and achieve a finer control. In this paper, we bridge these two objectives and introduce the concept of cross-domain disentanglement. We aim to separate the internal representation into three parts. The shared part contains information for both domains. The exclusive parts, on the other hand, contain only factors of variation that are particular to each domain. We achieve this through bidirectional image translation based on Generative Adversarial Networks and cross-domain autoencoders, a novel network component. Our model offers multiple advantages. We can output diverse samples covering multiple modes of the distributions of both domains, perform domain-specific image transfer and interpolation, and cross-domain retrieval without the need of labeled data, only paired images. We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets.

arxiv.org

Improving Explorability in Variational Inference with Annealed Variational Objectives

Chin-Wei Huang

Shawn Tan

Alexandre Lacoste

Aaron Courville

Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can s… (see more)till limit the density that is ultimately learned. We demonstrate the drawbacks of biasing the true posterior to be unimodal, and introduce Annealed Variational Objectives (AVO) into the training of hierarchical variational methods. Inspired by Annealed Importance Sampling, the proposed method facilitates learning by incorporating energy tempering into the optimization objective. In our experiments, we demonstrate our method's robustness to deterministic warm up, and the benefits of encouraging exploration in the latent space.

arxiv.org

Investigating the viability of Generative Models for Novelty Detection

Vidhi Jain

Aaron Courville

Abstract

LATTER M INIMA WITH SGD

Stanisław Jastrzębski

Zac Kenton

Devansh Arpit

Nicolas Ballas

Asja Fischer

Yoshua Bengio

Amos Storkey

It has been discussed that over-parameterized deep neural networks (DNNs) trained using stochastic gradient descent (SGD) with smaller batch… (see more) sizes generalize better compared with those trained with larger batch sizes. Additionally, model parameters found by small batch size SGD tend to be in flatter regions. We extend these empirical observations and experimentally show that both large learning rate and small batch size contribute towards SGD finding flatter minima that generalize well. Conversely, we find that small learning rates and large batch sizes lead to sharper minima that correlate with poor generalization in DNNs.

2018-01-01

(published)

www.semanticscholar.org

LATTER M INIMA WITH SGD

Stanisław Jastrzębski

Zac Kenton

Devansh Arpit

Nicolas Ballas

Asja Fischer

Yoshua Bengio

Amos Storkey

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Publications

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Popular keywords:

Publications