Publications

Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training
William Harvey
Michael Teng
Hard visual attention is a promising approach to reduce the computational burden of modern computer vision methodologies. However, hard atte… (see more)ntion mechanisms can be difficult and slow to train, which is especially costly for applications like neural architecture search where multiple networks must be trained. We introduce a method to amortise the cost of training by generating an extra supervision signal for a subset of the training data. This supervision is in the form of sequences of ‘good’ locations to attend to for each image. We find that the best method to generate supervision sequences comes from framing hard attention for image classification as a Bayesian optimal experimental design (BOED) problem. From this perspective, the optimal locations to attend to are those which provide the greatest expected reduction in the entropy of the classification distribution. We introduce methodology from the BOED literature to approximate this optimal behaviour and generate ‘near-optimal’ supervision sequences. We then present a hard attention network training objective that makes use of these sequences and show that it allows faster training than prior work. We finally demonstrate the utility of faster hard attention training by incorporating supervision sequences in a neural architecture search, resulting in hard attention architectures which can outperform networks with access to the entire image.
Stochastic Neural Network with Kronecker Flow
Chin-Wei Huang
Ahmed Touati
Alexandre Lacoste
Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to… (see more) scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.
Dissociating memory accessibility and precision in forgetting
S. Berens
A. Horner
Dynamic spectrum access under partial observations: A restless bandit approach
Nima Akbarzadeh
We consider a communication system where multiple unknown channels are available for transmission. Each channel is a channel with state whic… (see more)h evolves in a Markov manner. The transmitter has to select L channels to use and also decide the resources (e.g., power, rate, etc.) to use for each of the selected channels. It observes the state of the channels it uses and receives no feedback on the state of the other channels. We model this problem as a partially observable Markov decision process and obtain a simplified belief state. We show that the optimal resource allocation policy can be identified in closed form. Once the optimal resource allocation policy is fixed, choosing the channel scheduling policy may be viewed as a restless bandit. We present an efficient algorithm to check indexability and compute the Whittle index for each channel. When the model is indexable, the Whittle index policy, which transmits over the L channels with the smallest Whittle indices, is an attractive heuristic policy.
Understanding the Behaviour of Neural Abstractive Summarizers using Contrastive Examples
Krtin Kumar
Neural abstractive summarizers generate summary texts using a language model conditioned on the input source text, and have recently achieve… (see more)d high ROUGE scores on benchmark summarization datasets. We investigate how they achieve this performance with respect to human-written gold-standard abstracts, and whether the systems are able to understand deeper syntactic and semantic structures. We generate a set of contrastive summaries which are perturbed, deficient versions of human-written summaries, and test whether existing neural summarizers score them more highly than the human-written summaries. We analyze their performance on different datasets and find that these systems fail to understand the source text, in a majority of the cases.
Unsupervised Controllable Text Generation with Global Variation Discovery and Disentanglement
Peng Xu
Yanshuai Cao
Existing controllable text generation systems rely on annotated attributes, which greatly limits their capabilities and applications. In thi… (see more)s work, we make the first successful attempt to use VAEs to achieve controllable text generation without supervision. We do so by decomposing the latent space of the VAE into two parts: one incorporates structural constraints to capture dominant global variations implicitly present in the data, e.g., sentiment or topic; the other is unstructured and is used for the reconstruction of the source sentences. With the enforced structural constraint, the underlying global variations will be discovered and disentangled during the training of the VAE. The structural constraint also provides a natural recipe for mitigating posterior collapse for the structured part, which cannot be fully resolved by the existing techniques. On the task of text style transfer, our unsupervised approach achieves significantly better performance than previous supervised approaches. By showcasing generation with finer-grained control including Cards-Against-Humanity-style topic transitions within a sentence, we demonstrate that our model can perform controlled text generation in a more flexible way than existing methods.
Activity-Based Analysis of Open Source Software Contributors: Roles and Dynamics
Jinghui Cheng
Contributors to open source software (OSS) communities assume diverse roles to take different responsibilities. One major limitation of the … (see more)current OSS tools and platforms is that they provide a uniform user interface regardless of the activities performed by the various types of contributors. This paper serves as a non-trivial first step towards resolving this challenge by demonstrating a methodology and establishing knowledge to understand how the contributors' roles and their dynamics, reflected in the activities contributors perform, are exhibited in OSS communities. Based on an analysis of user action data from 29 GitHub projects, we extracted six activities that distinguished four Active roles and five Supporting roles of OSS contributors, as well as patterns in role changes. Through the lens of the Activity Theory, these findings provided rich design guidelines for OSS tools to support diverse contributor roles.
Fixing Bias in Reconstruction-based Anomaly Detection with Lipschitz Discriminators
Alexander Tong
Smita Krishnaswamy
Anomaly detection is of great interest in fields where abnormalities need to be identified and corrected (e.g., medicine and finance). Deep … (see more)learning methods for this task often rely on autoencoder reconstruction error, sometimes in conjunction with other penalties. We show that this approach exhibits intrinsic biases that lead to undesirable results. Reconstruction-based methods can sometimes show low error on simple-to-reconstruct points that are not part of the training data, for example the all black image. Instead, we introduce a new unsupervised Lipschitz anomaly discriminator (LAD) that does not suffer from these biases. Our anomaly discriminator is trained, similar to the discriminator of a GAN, to detect the difference between the training data and corruptions of the training data. We show that this procedure successfully detects unseen anomalies with guarantees on those that have a certain Wasserstein distance from the data or corrupted training set. These additions allow us to show improved performance on MNIST, CIFAR10, and health record data. Further, LAD does not require decoding back to the original data space, which makes anomaly detection possible in domains where it is difficult to define a decoder, such as in irregular graph structured data. Empirically, we show this framework leads to improved performance on image, health record, and graph data.
Analysis and Detection of Information Types of Open Source Software Issue Discussions
Deeksha M. Arya
Wenting Wang
Jinghui Cheng
Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Over time, these co… (see more)mments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy the diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task, especially when the discussions are lengthy and the number of issues in ITSs are vast. In this paper, we address this challenge by identifying the information types presented in OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy the diverse needs of OSS stakeholders.
Fairwashing: the risk of rationalization
Hiromi Arai
Olivier Fortineau
Sébastien Gambs
Satoshi Hara
Alain Tapp
Black-box explanation is the problem of explaining how a machine learning model -- whose internal logic is hidden to the auditor and general… (see more)ly complex -- produces its outcomes. Current approaches for solving this problem include model explanation, outcome explanation as well as model inspection. While these techniques can be beneficial by providing interpretability, they can be used in a negative manner to perform fairwashing, which we define as promoting the false perception that a machine learning model respects some ethical values. In particular, we demonstrate that it is possible to systematically rationalize decisions taken by an unfair black-box model using the model explanation as well as the outcome explanation approaches with a given fairness metric. Our solution, LaundryML, is based on a regularized rule list enumeration algorithm whose objective is to search for fair rule lists approximating an unfair black-box model. We empirically evaluate our rationalization technique on black-box models trained on real-world datasets and show that one can obtain rule lists with high fidelity to the black-box model while being considerably less unfair at the same time.
N-BEATS: Neural basis expansion analysis for interpretable time series forecasting
Boris Oreshkin
Dmitri Carpov
We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based o… (see more)n backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy.
Prediction of Disease Progression in Multiple Sclerosis Patients using Deep Learning Analysis of MRI Data
Adrian Tousignant
Paul Lemaitre
Douglas Arnold
We present the first automatic end-to-end deep learning framework for the prediction of future patient disability progression (one year from… (see more) baseline) based on multi-modal brain Magnetic Resonance Images (MRI) of patients with Multiple Sclerosis (MS). The model uses parallel convolutional pathways, an idea introduced by the popular Inception net (Szegedy et al., 2015) and is trained and tested on two large proprietary, multi-scanner, multi-center, clinical trial datasets of patients with Relapsing-Remitting Multiple Sclerosis (RRMS). Experiments on 465 patients on the placebo arms of the trials indicate that the model can accurately predict future disease progression, measured by a sustained increase in the extended disability status scale (EDSS) score over time. Using only the multi-modal MRI provided at baseline, the model achieves an AUC of 0.66±0.055. However, when supplemental lesion label masks are provided as inputs as well, the AUC increases to 0.701± 0.027. Furthermore, we demonstrate that uncertainty estimates based on Monte Carlo dropout sample variance correlate with errors made by the model. Clinicians provided with the predictions computed by the model can therefore use the associated uncertainty estimates to assess which scans require further examination.