Publications

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Devansh Arpit

Vı́ctor Campos

Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initial… (see more)ization strategies have not been studied previously for weight normalized networks and, in practice, initialization methods designed for un-normalized networks are used as a proxy. Similarly, initialization for ResNets have also been studied for un-normalized networks and often under simplified settings ignoring the shortcut connection. To address these issues, we propose a novel parameter initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections. The proposed strategy is based on a theoretical analysis using mean field approximation. We run over 2,500 experiments and evaluate our proposal on image datasets showing that the proposed initialization outperforms existing initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds, especially when networks get deeper in which case existing methods fail to even start training. Finally, we show that using our initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

arxiv.org

Improving advance medical directives: lessons from Quebec

Louise G. Bernier

Catherine Régis

Policy-makers’ efforts to increase the uptake of advance medical directives (AMDs), and the legal constraints they impose on health profes… (see more)sionals, are bringing greater scrutiny to provincial AMD regimes. In 2015, Quebec introduced a new, legally binding form to be filled out for AMDs, which limits individuals’ expression of their wishes to narrow, checklist responses to questions on specific medical interventions. This form-focused regime has other shortcomings: it relies on individuals to self-inform and it does not provide them the opportunity to meaningfully convey their preferences for end-of-life care. A more values-based and collaborative approach provides a better path forward for Quebec and for other provinces.

InfoBot: Structured Exploration in ReinforcementLearning Using Information Bottleneck

Anirudh Goyal

Riashat Islam

D. Strouse

Zafarali Ahmed

Matthew Botvinick

Hugo Larochelle

Yoshua Bengio

Sergey Levine

InfoBot: Transfer and Exploration via the Information Bottleneck

Anirudh Goyal

Riashat Islam

DJ Strouse

Zafarali Ahmed

Matthew Botvinick

Hugo Larochelle

Sergey Levine

Yoshua Bengio

A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We postula… (see more)te that in the absence of useful reward signals, an effective exploration strategy should seek out {\it decision states}. These states lie at critical junctions in the state space from where the agent can transition to new, potentially unexplored regions. We propose to learn about decision states from prior experience. By training a goal-conditioned policy with an information bottleneck, we can identify decision states by examining where the model actually leverages the goal state. We find that this simple mechanism effectively identifies decision states, even in partially observed settings. In effect, the model learns the sensory cues that correlate with potential subgoals. In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.

2019-01-01

ICLR.cc/2019/Conference (poster)

openreview.net

Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Accuracy

Alex Lamb

Vikas Verma

Juho Kannala

Yoshua Bengio

Adversarial robustness has become a central goal in deep learning, both in theory and practice. However, successful methods to improve adver… (see more)sarial robustness (such as adversarial training) greatly hurt generalization performance on the clean data. This could have a major impact on how adversarial robustness affects real world systems (i.e. many may opt to forego robustness if it can improve performance on the clean data). We propose Interpolated Adversarial Training, which employs recently proposed interpolation based training methods in the framework of adversarial training. On CIFAR-10, adversarial training increases clean test error from 5.8% to 16.7%, whereas with our Interpolated adversarial training we retain adversarial robustness while achieving a clean test error of only 6.5%. With our technique, the relative error increase for the robust model is reduced from 187.9% to just 12.1%.

2019-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Accuracy

Alex Lamb

Vikas Verma

Juho Kannala

Yoshua Bengio

Adversarial robustness has become a central goal in deep learning, both in theory and practice. However, successful methods to improve adver… (see more)sarial robustness (such as adversarial training) greatly hurt generalization performance on the clean data. This could have a major impact on how adversarial robustness affects real world systems (i.e. many may opt to forego robustness if it can improve performance on the clean data). We propose Interpolated Adversarial Training, which employs recently proposed interpolation based training methods in the framework of adversarial training. On CIFAR-10, adversarial training increases clean test error from 5.8% to 16.7%, whereas with our Interpolated adversarial training we retain adversarial robustness while achieving a clean test error of only 6.5%. With our technique, the relative error increase for the robust model is reduced from 187.9% to just 12.1%.

2019-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Lagrangian neurodynamics for real-time error-backpropagation across cortical areas

Dominik Dold

Akos F. Kungl

João Sacramento

Mihai A. Petrovici

Kaspar Schindler

Jonathan Binas

Yoshua Bengio

Walter Senn

.

Learning Brain Dynamics from Calcium Imaging with Coupled van der Pol and LSTM

Germán Abrevaya

Irina Rish

Aleksandr Y. Aravkin

Guillermo Cecchi

James Kozloski

Pablo Polosecki

Peng Zheng

Silvina Ponce Dawson

Juliana Y. Rhee

David Daniel Cox

Many real-world data sets, especially in biology, are produced by complex nonlinear dynamical systems. In this paper, we focus on brain calc… (see more)ium imaging (CaI) of different organisms (zebrafish and rat), aiming to build a model of joint activation dynamics in large neuronal populations, including the whole brain of zebrafish. We propose a new approach for capturing dynamics of temporal SVD components that uses the coupled (multivariate) van der Pol (VDP) oscillator, a nonlinear ordinary differential equation (ODE) model describing neural activity, with a new parameter estimation technique that combines variable projection optimization and stochastic search. We show that the approach successfully handles nonlinearities and hidden state variables in the coupled VDP. The approach is accurate, achieving 0.82 to 0.94 correlation between the actual and model-generated components, and interpretable, as VDP’s coupling matrix reveals anatomically meaningful positive (excitatory) and negative (inhibitory) interactions across different brain subsystems corresponding to spatial SVD components. Moreover, VDP is comparable to (or sometimes better than) recurrent neural networks (LSTM) for (short-term) prediction of future brain activity; VDP needs less parameters to train, which was a plus on our small training data. Finally, the overall best predictive method, greatly outperforming both VDP and LSTM in shortand long-term predicitve settings on both datasets, was the new hybrid VDP-LSTM approach that used VDP to simulate large domain-specific dataset for LSTM pretraining; note that simple LSTM data-augmentation via noisy versions of training data was much less effective.

Learning deep representations by mutual information estimation and maximization

(Rex) Devon Hjelm

Alex Fedorov

Samuel Lavoie-Marchildon

Karan Grewal

Adam Trischler

Phil Bachman

Yoshua Bengio

This work investigates unsupervised learning of representations by maximizing mutual information between an input and the output of a deep n… (see more)eural network encoder. Importantly, we show that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks. We further control characteristics of the representation by matching to a prior distribution adversarially. Our method, which we call Deep InfoMax (DIM), outperforms a number of popular unsupervised learning methods and compares favorably with fully-supervised learning on several classification tasks in with some standard architectures. DIM opens new avenues for unsupervised learning of representations and is an important step towards flexible formulations of representation learning objectives for specific end-goals.

2019-01-01

ICLR (published)

openreview.net

Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference

Matthew D Riemer

Ignacio Cases

Robert Ajemian

Miao Liu

Irina Rish

Yuhai Tu

Gerald Tesauro

Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neura… (see more)l network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.

2019-01-01

ICLR.cc/2019/Conference (poster)

openreview.net

LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models

Yuanshuo Zhou

Bradley Gram-Hansen

Tobias Kohn

Tom Rainforth

Hongseok Yang

F. Wood

We develop a new Low-level, First-order Probabilistic Programming Language~(LF-PPL) suited for models containing a mix of continuous, discre… (see more)te, and/or piecewise-continuous variables. The key success of this language and its compilation scheme is in its ability to automatically distinguish parameters the density function is discontinuous with respect to, while further providing runtime checks for boundary crossings. This enables the introduction of new inference engines that are able to exploit gradient information, while remaining efficient for models which are not everywhere differentiable. We demonstrate this ability by incorporating a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this language has a density with measure zero discontinuities to maintain the validity of the inference engine.

2019-01-01

AISTATS (published)

proceedings.mlr.press

arxiv.org

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Kundan Kumar

Rithesh Kumar

Thibault De Boissière

Lucas Gestin

Wei Zhen Teoh

Jose Sotelo

Alexandre De Brébisson

Yoshua Bengio

Aaron Courville

arxiv.org

Mila AI Policy Fellowship

The Development of the UN Scientific Panel on AI

Mila AI Policy Fellowship

The Development of the UN Scientific Panel on AI

Publications

Mila AI Policy Fellowship

The Development of the UN Scientific Panel on AI

Mila AI Policy Fellowship

The Development of the UN Scientific Panel on AI

Popular keywords:

Publications