Publications

Selective Brain Damage: Measuring the Disparate Impact of Model Pruning

Sara Hooker

Yann Dauphin

Andrea Frome

Neural network pruning techniques have demonstrated it is possible to remove the majority of weights in a network with surprisingly little d… (voir plus)egradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by pruning. We find that certain examples, which we term pruning identified exemplars (PIEs), and classes are systematically more impacted by the introduction of sparsity. Removing PIE images from the test-set greatly improves top-1 accuracy for both pruned and non-pruned models. These hard-to-generalize-to images tend to be mislabelled, of lower image quality, depict multiple objects or require fine-grained classification. These findings shed light on previously unknown trade-offs, and suggest that a high degree of caution should be exercised before pruning is used in sensitive domains.

2019-11-12

arXiv.org (prépublication)

openreview.net

What Do Compressed Deep Neural Networks Forget

Sara Hooker

Gregory Clark

Yann Dauphin

Andrea Frome

Deep neural network pruning and quantization techniques have demonstrated it is possible to achieve high levels of compression with surprisi… (voir plus)ngly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques. We find that models with radically different numbers of weights have comparable top-line performance metrics but diverge considerably in behavior on a narrow subset of the dataset. This small subset of data points, which we term Pruning Identified Exemplars (PIEs) are systematically more impacted by the introduction of sparsity. Compression disproportionately impacts model performance on the underrepresented long-tail of the data distribution. PIEs over-index on atypical or noisy images that are far more challenging for both humans and algorithms to classify. Our work provides intuition into the role of capacity in deep neural networks and the trade-offs incurred by compression. An understanding of this disparate impact is critical given the widespread deployment of compressed models in the wild.

2019-11-12

ArXiv (prépublication)

Defining ‘actionable’ high- costhealth care use: results using the Canadian Institute for Health Information population grouping methodology

Maureen Anderson

Crawford W. Revie

Henrik Stryhn

Cordell Neudorf

Yvonne Rosehart

Wenbin Li

Meriç Osman

David L Buckeridge

Laura C. Rosella

Walter P. Wodchis

2019-11-09

International Journal for Equity in Health (publié)

Preventing Posterior Collapse in Sequence VAEs with Pooling

Teng Long

Yanshuai Cao

Jackie CK Cheung

Variational Autoencoders (VAEs) hold great potential for modelling text, as they could in theory separate high-level semantic and syntactic … (voir plus)properties from local regularities of natural language. Practically, however, VAEs with autoregressive decoders often suffer from posterior collapse, a phenomenon where the model learns to ignore the latent variables, causing the sequence VAE to degenerate into a language model. Previous works attempt to solve this problem with complex architectural changes or costly optimization schemes. In this paper, we argue that posterior collapse is caused in part by the encoder network failing to capture the input variabilities. We verify this hypothesis empirically and propose a straightforward fix using pooling. This simple technique effectively prevents posterior collapse, allowing the model to achieve significantly better data log-likelihood than standard sequence VAEs. Compared to the previous SOTA on preventing posterior collapse, we are able to achieve comparable performances while being significantly faster.

2019-11-09

ArXiv (prépublication)

Adversarial target-invariant representation learning for domain generalization

Isabela Albuquerque

Joao Monteiro

Tiago Falk

Ioannis Mitliagkas

In many applications of machine learning, the training and test set data come from different distributions, or domains. A number of domain g… (voir plus)eneralization strategies have been introduced with the goal of achieving good performance on out-of-distribution data. In this paper, we propose an adversarial approach to the problem. We propose a process that enforces pair-wise domain invariance while training a feature extractor over a diverse set of domains. We show that this process ensures invariance to any distribution that can be expressed as a mixture of the training domains. Following this insight, we then introduce an adversarial approach in which pair-wise divergences are estimated and minimized. Experiments on two domain generalization benchmarks for object recognition (i.e., PACS and VLCS) show that the proposed method yields higher average accuracy on the target domains in comparison to previously introduced adversarial strategies, as well as recently proposed methods based on learning invariant representations.

2019-11-02

arXiv.org (prépublication)

dblp.uni-trier.de

Cascaded Gaussian Processes for Data-efficient Robot Dynamics Learning

Sahand Rezaei-Shoshtari

David Meger

Inna Sharf

Motivated by the recursive Newton-Euler formulation, we propose a novel cascaded Gaussian process learning framework for the inverse dynamic… (voir plus)s of robot manipulators. This approach leads to a significant dimensionality reduction which in turn results in better learning and data efficiency. We explore two formulations for the cascading: the inward and outward, both along the manipulator chain topology. The learned modeling is tested in conjunction with the classical inverse dynamics model (semi-parametric) and on its own (non-parametric) in the context of feed-forward control of the arm. Experimental results are obtained with Jaco 2 six-DOF and SARCOS seven-DOF manipulators for randomly defined sinusoidal motions of the joints in order to evaluate the performance of cascading against the standard GP learning. In addition, experiments are conducted using Jaco 2 on a task emulating a pouring maneuver. Results indicate a consistent improvement in learning speed with the inward cascaded GP model and an overall improvement in data efficiency and generalization.

2019-11-02

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (publié)

Deep Generative Modeling of LiDAR Data

Lucas Caccia

Herke van Hoof

Joelle Pineau

Building models capable of generating structured output is a key challenge for AI and robotics. While generative models have been explored o… (voir plus)n many types of data, little work has been done on synthesizing lidar scans, which play a key role in robot mapping and localization. In this work, we show that one can adapt deep generative models for this task by unravelling lidar scans into a 2D point map. Our approach can generate high quality samples, while simultaneously learning a meaningful latent representation of the data. We demonstrate significant improvements against state-of-the-art point cloud generation methods. Furthermore, we propose a novel data representation that augments the 2D signal with absolute positional information. We show that this helps robustness to noisy and imputed input; the learned model can recover the underlying lidar scan from seemingly uninformative data

2019-11-02

IEEE/RSJ International Conference on Intelligent Robots and Systems (publié)

Mohammad Javad Darvishi Bayazi

Generalizing to unseen domains via distribution matching

Isabela Albuquerque

Joao Monteiro

Tiago Falk

Ioannis Mitliagkas

Supervised learning results typically rely on assumptions of i.i.d. data. Unfortunately, those assumptions are commonly violated in practice… (voir plus). In this work, we tackle this problem by focusing on domain generalization: a formalization where the data generating process at test time may yield samples from never-before-seen domains (distributions). Our work relies on a simple lemma: by minimizing a notion of discrepancy between all pairs from a set of given domains, we also minimize the discrepancy between any pairs of mixtures of domains. Using this result, we derive a generalization bound for our setting. We then show that low risk over unseen domains can be achieved by representing the data in a space where (i) the training distributions are indistinguishable, and (ii) relevant information for the task at hand is preserved. Minimizing the terms in our bound yields an adversarial formulation which estimates and minimizes pairwise discrepancies. We validate our proposed strategy on standard domain generalization benchmarks, outperforming a number of recently introduced methods. Notably, we tackle a real-world application where the underlying data corresponds to multi-channel electroencephalography time series from different subjects, each considered as a distinct domain.

2019-11-02

ArXiv (prépublication)

Batch Weight for Domain Adaptation With Mass Shift

Mikolaj Binkowski

R Devon Hjelm

Unsupervised domain transfer is the task of transferring or translating samples from a source distribution to a different target distributio… (voir plus)n. Current solutions unsupervised domain transfer often operate on data on which the modes of the distribution are well-matched, for instance have the same frequencies of classes between source and target distributions. However, these models do not perform well when the modes are not well-matched, as would be the case when samples are drawn independently from two different, but related, domains. This mode imbalance is problematic as generative adversarial networks (GANs), a successful approach in this setting, are sensitive to mode frequency, which results in a mismatch of semantics between source samples and generated samples of the target distribution. We propose a principled method of re-weighting training samples to correct for such mass shift between the transferred distributions, which we call batch weight. We also provide rigorous probabilistic setting for domain transfer and new simplified objective for training transfer networks, an alternative to complex, multi-component loss functions used in the current state-of-the art image-to-image translation models. The new objective stems from the discrimination of joint distributions and enforces cycle-consistency in an abstract, high-level, rather than pixel-wise, sense. Lastly, we experimentally show the effectiveness of the proposed methods in several image-to-image translation tasks.

2019-11-01

2019 IEEE/CVF International Conference on Computer Vision (ICCV) (publié)

Improved Conditional VRNNs for Video Prediction

Lluis Castrejon

Nicolas Ballas

Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent v… (voir plus)ariable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting. To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models. Our approach relies on a hierarchy of latent variables, which defines a family of flexible prior and posterior distributions in order to better model the probability of future sequences. We validate our proposal through a series of ablation experiments and compare our approach to current state-of-the-art latent variable models. Our method performs favorably under several metrics in three different datasets.

2019-11-01

2019 IEEE/CVF International Conference on Computer Vision (ICCV) (publié)

Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction

Alaaeldin El-Nouby

Shikhar Sharma

Hannes Schulz

Devon Hjelm

Layla El Asri

Samira Ebrahimi Kahou

Yoshua Bengio

Graham W. Taylor

Conditional text-to-image generation is an active area of research, with many possible applications. Existing research has primarily focused… (voir plus) on generating a single image from available conditioning information in one step. One practical extension beyond one-step generation is a system that generates an image iteratively, conditioned on ongoing linguistic input or feedback. This is significantly more challenging than one-step generation tasks, as such a system must understand the contents of its generated images with respect to the feedback history, the current feedback, as well as the interactions among concepts present in the feedback history. In this work, we present a recurrent image generation model which takes into account both the generated output up to the current step as well as all past instructions for generation. We show that our model is able to generate the background, add new objects, and apply simple transformations to existing objects. We believe our approach is an important step toward interactive generation. Code and data is available at: https://www.microsoft.com/en-us/research/project/generative-neural-visual-artist-geneva/ .

2019-11-01

2019 IEEE/CVF International Conference on Computer Vision (ICCV) (publié)