Publications

Unifying Variational Inference and PAC-Bayes for Supervised Learning that Scales
Neural Network based controllers hold enormous potential to learn complex, high-dimensional functions. However, they are prone to overfittin… (see more)g and unwarranted extrapolations. PAC Bayes is a generalized framework which is more resistant to overfitting and that yields performance bounds that hold with arbitrarily high probability even on the unjustified extrapolations. However, optimizing to learn such a function and a bound is intractable for complex tasks. In this work, we propose a method to simultaneously learn such a function and estimate performance bounds that scale organically to high-dimensions, non-linear environments without making any explicit assumptions about the environment. We build our approach on a parallel that we draw between the formulations called ELBO and PAC Bayes when the risk metric is negative log likelihood. Through our experiments on multiple high dimensional MuJoCo locomotion tasks, we validate the correctness of our theory, show its ability to generalize better, and investigate the factors that are important for its learning. The code for all the experiments is available at this https URL.
Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery
We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billio… (see more)n labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-supervised fashion. We provide a set of baselines for different feature extractors that can be built upon. Additionally, we perform qualitative evaluations on results from PCA embeddings, where we identify some clustering of known subtypes indicating the potential for representation learning in arrhythmia sub-type discovery.
Retrieving Signals with Deep Complex Extractors
Ousmane Dia
Mirco Ravanaelli
Christopher Pal
Recent advances have made it possible to create deep complex-valued neural networks. Despite this progress, many challenging learning tasks … (see more)have yet to leverage the power of complex representations. Building on recent advances, we propose a new deep complex-valued method for signal retrieval and extraction in the frequency domain. As a case study, we perform audio source separation in the Fourier domain. Our new method takes advantage of the convolution theorem which states that the Fourier transform of two convolved signals is the elementwise product of their Fourier transforms. Our novel method is based on a complex-valued version of Feature-Wise Linear Modulation (FiLM) and serves as the keystone of our proposed signal extraction method. We also introduce a new and explicit amplitude and phase-aware loss, which is scale and time invariant, taking into account the complex-valued components of the spectrogram. Using the Wall Street Journal Dataset, we compared our phase-aware loss to several others that operate both in the time and frequency domains and demonstrate the effectiveness of our proposed signal extraction method and proposed loss.
Continual Learning of New Sound Classes Using Generative Replay
Zhepei Wang
Yusuf Cem Sübakan
Efthymios Tzinis
Paris Smaragdis
Continual learning consists in incrementally training a model on a sequence of datasets and testing on the union of all datasets. In this pa… (see more)per, we examine continual learning for the problem of sound classification, in which we wish to refine already trained models to learn new sound classes. In practice one does not want to maintain all past training data and retrain from scratch, but naively updating a model with new data(sets) results in a degradation of already learned tasks, which is referred to as "catastrophic forgetting." We develop a generative replay procedure for generating training audio spectrogram data, in place of keeping older training datasets. We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data. We thus conclude that we can extend a trained sound classifier to learn new classes without having to keep previously used datasets.
Predicting ice flow using machine learning
S. Karthik Mukkavilli
Though machine learning has achieved notable success in modeling sequential and spatial data for speech recognition and in computer vision, … (see more)applications to remote sensing and climate science problems are seldom considered. In this paper, we demonstrate techniques from unsupervised learning of future video frame prediction, to increase the accuracy of ice flow tracking in multi-spectral satellite images. As the volume of cryosphere data increases in coming years, this is an interesting and important opportunity for machine learning to address a global challenge for climate change, risk management from floods, and conserving freshwater resources. Future frame prediction of ice melt and tracking the optical flow of ice dynamics presents modeling difficulties, due to uncertainties in global temperature increase, changing precipitation patterns, occlusion from cloud cover, rapid melting and glacier retreat due to black carbon aerosol deposition, from wildfires or human fossil emissions. We show the adversarial learning method helps improve the accuracy of tracking the optical flow of ice dynamics compared to existing methods in climate science. We present a dataset, IceNet, to encourage machine learning research and to help facilitate further applications in the areas of cryospheric science and climate change.
A language processing algorithm for predicting tactical solutions to an operational planning problem under uncertainty
Eric Larsen
This paper is devoted to the prediction of solutions to a stochastic discrete optimization problem. Through an application, we illustrate ho… (see more)w we can use a state-of-the-art neural machine translation (NMT) algorithm to predict the solutions by defining appropriate vocabularies, syntaxes and constraints. We attend to applications where the predictions need to be computed in very short computing time -- in the order of milliseconds or less. The results show that with minimal adaptations to the model architecture and hyperparameter tuning, the NMT algorithm can produce accurate solutions within the computing time budget. While these predictions are slightly less accurate than approximate stochastic programming solutions (sample average approximation), they can be computed faster and with less variability.
Propagating Uncertainty Across Cascaded Medical Imaging Tasks for Improved Deep Learning Inference
Raghav Mehta
Thomas Christinck
Tanya Nair
Aurélie Bussy
Paul Lemaitre
Swapna Premasiri
Manuela Costantino
Mallar Chakravarty
Douglas Arnold
Yarin Gal
Saliency Based Deep Neural Network for Automatic Detection of Gadolinium-Enhancing Multiple Sclerosis Lesions in Brain MRI
Joshua D. Durso-Finley
Douglas Arnold
SGP: Spotting Groups Polluting the Online Political Discourse
Junhao Wang
Sacha Lévy
Ren Wang
Social media sites are becoming a key factor in politics. These platforms are easy to manipulate for the purpose of distorting information s… (see more)pace to confuse and distract voters. It is of paramount importance for social media platforms, users engaged with online political discussions, as well as government agencies to understand the dynamics on social media, and identify malicious groups engaging in misinformation campaigns and thus polluting the general discourse around a topic of interest. Past works to identify such disruptive patterns are mostly focused on analyzing user-generated content such as tweets. In this study, we take a holistic approach and propose SGP to provide an informative birds eye view of all the activities in these social media sites around a broad topic and detect coordinated groups suspicious of engaging in misinformation campaigns. To show the effectiveness of SGP, we deploy it to provide a concise overview of polluting activity on Twitter around the upcoming 2019 Canadian Federal Elections, by analyzing over 60 thousand user accounts connected through 3.4 million connections and 1.3 million hashtags. Users in the polluting groups detected by SGP-flag are over 4x more likely to become suspended while majority of these highly suspicious users detected by SGP-flag escaped Twitter's suspending algorithm. Moreover, while few of the polluting hashtags detected are linked to misinformation campaigns, SGP-sig also flags others that have not been picked up on. More importantly, we also show that a large coordinated set of right-winged conservative groups based in the US are heavily engaged in Canadian politics.
Actor Critic with Differentially Private Critic
William Hamilton
Borja Balle
Reinforcement learning algorithms are known to be sample inefficient, and often performance on one task can be substantially improved by lev… (see more)eraging information (e.g., via pre-training) on other related tasks. In this work, we propose a technique to achieve such knowledge transfer in cases where agent trajectories contain sensitive or private information, such as in the healthcare domain. Our approach leverages a differentially private policy evaluation algorithm to initialize an actor-critic model and improve the effectiveness of learning in downstream tasks. We empirically show this technique increases sample efficiency in resource-constrained control problems while preserving the privacy of trajectories collected in an upstream task.
Deep learning for Aerosol Forecasting
Caleb Hoyne
S. Karthik Mukkavilli
Reanalysis datasets combining numerical physics models and limited observations to generate a synthesised estimate of variables in an Earth … (see more)system, are prone to biases against ground truth. Biases identified with the NASA Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2) aerosol optical depth (AOD) dataset, against the Aerosol Robotic Network (AERONET) ground measurements in previous studies, motivated the development of a deep learning based AOD prediction model globally. This study combines a convolutional neural network (CNN) with MERRA-2, tested against all AERONET sites. The new hybrid CNN-based model provides better estimates validated versus AERONET ground truth, than only using MERRA-2 reanalysis.
Nash Games Among Stackelberg Leaders
Gabriele Dragotto
Felipe Feijoo
Andrea Lodi
Sriram Sankaranarayanan
We analyze Nash games played among leaders of Stackelberg games (NASP). We show it is Σ p 2 - hard to decide if the game has a mixed-strate… (see more)gy Nash equilibrium (MNE), even when there are only two leaders and each leader has one follower. We provide a finite time algorithm with a running time bounded by O (2 2 n ) which computes MNEs for NASP when it exists and returns infeasibility if no MNE exists. We also provide two ways to improve the algorithm which involves constructing a series of inner approximations (alternatively, outer approximations) to the leaders’ feasible region that will provably obtain the required MNE. Finally, we test our algorithms on a range of NASPs arising out of a game in the energy market, where countries act as Stackelberg leaders who play a Nash game, and the domestic producers act as the followers.