Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information
Eric P. Larsen
Sébastien Lachapelle
Andrea Lodi
This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a method… (voir plus)ology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming, where the second stage is demanding computationally. We aim to predict at a high speed the expected TDOS associated with the second-stage problem, conditionally on the first-stage variables. This may be used in support of the solution to the overall two-stage problem by avoiding the online generation of multiple second-stage scenarios and solutions. We formulate the tactical prediction problem as a stochastic optimal prediction program, whose solution we approximate with supervised machine learning. The training data set consists of a large number of deterministic operational problems generated by controlled probabilistic sampling. The labels are computed based on solutions to these problems (solved independently and offline), employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application on load planning for rail transportation show that deep learning models produce accurate predictions in very short computing time (milliseconds or less). The predictive accuracy is close to the lower bounds calculated based on sample average approximation of the stochastic prediction programs.
Attend Before you Act: Leveraging human visual attention for continual learning
When humans perform a task, such as playing a game, they selectively pay attention to certain parts of the visual input, gathering relevant … (voir plus)information and sequentially combining it to build a representation from the sensory data. In this work, we explore leveraging where humans look in an image as an implicit indication of what is salient for decision making. We build on top of the UNREAL architecture in DeepMind Lab's 3D navigation maze environment. We train the agent both with original images and foveated images, which were generated by overlaying the original images with saliency maps generated using a real-time spectral residual technique. We investigate the effectiveness of this approach in transfer learning by measuring performance in the context of noise in the environment.
Active Search of Connections for Case Building and Combating Human Trafficking
David Bayani
Artur Dubrawski
How can we help an investigator to efficiently connect the dots and uncover the network of individuals involved in a criminal activity based… (voir plus) on the evidence of their connections, such as visiting the same address, or transacting with the same bank account? We formulate this problem as Active Search of Connections, which finds target entities that share evidence of different types with a given lead, where their relevance to the case is queried interactively from the investigator. We present RedThread, an efficient solution for inferring related and relevant nodes while incorporating the user's feedback to guide the inference. Our experiments focus on case building for combating human trafficking, where the investigator follows leads to expose organized activities, i.e. different escort advertisements that are connected and possibly orchestrated. RedThread is a local algorithm and enables online case building when mining millions of ads posted in one of the largest classified advertising websites. The results of RedThread are interpretable, as they explain how the results are connected to the initial lead. We experimentally show that RedThread learns the importance of the different types and different pieces of evidence, while the former could be transferred between cases.
Negative Momentum for Improved Game Dynamics
Reyhane Askari Hemmat
Mohammad Pezeshki
Gabriel Huang
Rémi LE PRIOL
Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiab… (voir plus)le games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optimization, game dynamics are more complex and less understood. In this paper, we analyze gradient-based methods with momentum on simple games. We prove that alternating updates are more stable than simultaneous updates. Next, we show both theoretically and empirically that alternating gradient updates with a negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs.
Feature-wise transformations
Vincent Dumoulin
Ethan Perez
Nathan Schucher
Florian Strub
Harm de Vries
MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation
Konstantinos Drossos
Stylianos Ioannis Mimilakis
Dmitriy Serdyuk
Gerald Schuller
Tuomas Virtanen
Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current st… (voir plus)ate of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel recurrent neural approach that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.
Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning
In this paper, we unravel a fundamental connection between weighted finite automata~(WFAs) and second-order recurrent neural networks~(2-RNN… (voir plus)s): in the case of sequences of discrete symbols, WFAs and 2-RNNs with linear activation functions are expressively equivalent. Motivated by this result, we build upon a recent extension of the spectral learning algorithm to vector-valued WFAs and propose the first provable learning algorithm for linear 2-RNNs defined over sequences of continuous input vectors. This algorithm relies on estimating low rank sub-blocks of the so-called Hankel tensor, from which the parameters of a linear 2-RNN can be provably recovered. The performances of the proposed method are assessed in a simulation study.
Information Fusion in Deep Convolutional Neural Networks for Biomedical Image Segmentation 1
Mohammad Havaei
Nicolas Guizard
Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data
Amjad Almahairi
Sai Rajeswar
Philip Bachman
Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by red… (voir plus)ucing the need for paired data. CycleGAN was recently proposed for this problem, but critically assumes the underlying inter-domain mapping is approximately deterministic and one-to-one. This assumption renders the model ineffective for tasks requiring flexible, many-to-many mappings. We propose a new model, called Augmented CycleGAN, which learns many-to-many mappings between domains. We examine Augmented CycleGAN qualitatively and quantitatively on several image datasets.
Focused Hierarchical RNNs for Conditional Sequence Processing
Nan Rosemary Ke
Konrad Żołna
Zhouhan Lin
Adam Trischler
Recurrent Neural Networks (RNNs) with attention mechanisms have obtained state-of-the-art results for many sequence processing tasks. Most o… (voir plus)f these models use a simple form of encoder with attention that looks over the entire sequence and assigns a weight to each token independently. We present a mechanism for focusing RNN encoders for sequence modelling tasks which allows them to attend to key parts of the input as needed. We formulate this using a multi-layer conditional sequence encoder that reads in one token at a time and makes a discrete decision on whether the token is relevant to the context or question being asked. The discrete gating mechanism takes in the context embedding and the current hidden state as inputs and controls information flow into the layer above. We train it using policy gradient methods. We evaluate this method on several types of tasks with different attributes. First, we evaluate the method on synthetic tasks which allow us to evaluate the model for its generalization ability and probe the behavior of the gates in more controlled settings. We then evaluate this approach on large scale Question Answering tasks including the challenging MS MARCO and SearchQA tasks. Our models shows consistent improvements for both tasks over prior work and our baselines. It has also shown to generalize significantly better on synthetic tasks as compared to the baselines.
Mutual Information Neural Estimation
Ishmael Belghazi
Aristide Baratin
Sai Rajeswar
Sherjil Ozair
We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent … (voir plus)over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.
Neural Autoregressive Flows
Chin-Wei Huang
Alexandre Lacoste
Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via M… (voir plus)asked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.