Generating Contradictory, Neutral, and Entailing Sentences
Yikang Shen
Shawn Tan
Chin-Wei Huang
Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to l… (voir plus)earn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on an input sentence and a logical inference label. We do this by modeling the different possibilities for the output sentence as a distribution over the latent representation, which we train using an adversarial objective. We evaluate the model using two state-of-the-art models for the Recognizing Textual Entailment (RTE) task, and measure the BLEU scores against the actual sentences as a probe for the diversity of sentences produced by our model. The experiment results show that, given our framework, we have clear ways to improve the quality and diversity of generated sentences.
A polynomial algorithm for a continuous bilevel knapsack problem
Andrea Lodi
Patrice Marcotte
Learning Anonymized Representations with Adversarial Neural Networks
Clément Feutry
P. Duhamel
Statistical methods protecting sensitive information or the identity of the data owner have become critical to ensure privacy of individuals… (voir plus) as well as of organizations. This paper investigates anonymization methods based on representation learning and deep neural networks, and motivated by novel information theoretical bounds. We introduce a novel training objective for simultaneously training a predictor over target variables of interest (the regular labels) while preventing an intermediate representation to be predictive of the private labels. The architecture is based on three sub-networks: one going from input to representation, one from representation to predicted regular labels, and one from representation to predicted private labels. The training procedure aims at learning representations that preserve the relevant part of the information (about regular labels) while dismissing information about the private labels which correspond to the identity of a person. We demonstrate the success of this approach for two distinct classification versus anonymization tasks (handwritten digits and sentiment analysis).
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Exploring why stochastic gradient descent (SGD) based optimization methods train deep neural networks (DNNs) that generalize well has become… (voir plus) an active area of research. Towards this end, we empirically study the dynamics of SGD when training over-parametrized DNNs. Specifically we study the DNN loss surface along the trajectory of SGD by interpolating the loss surface between parameters from consecutive \textit{iterations} and tracking various metrics during training. We find that the loss interpolation between parameters before and after a training update is roughly convex with a minimum (\textit{valley floor}) in between for most of the training. Based on this and other metrics, we deduce that during most of the training, SGD explores regions in a valley by bouncing off valley walls at a height above the valley floor. This 'bouncing off walls at a height' mechanism helps SGD traverse larger distance for small batch sizes and large learning rates which we find play qualitatively different roles in the dynamics. While a large learning rate maintains a large height from the valley floor, a small batch size injects noise facilitating exploration. We find this mechanism is crucial for generalization because the valley floor has barriers and this exploration above the valley floor allows SGD to quickly travel far away from the initialization point (without being affected by barriers) and find flatter regions, corresponding to better generalization.
Generalization in Machine Learning via Analytical Learning Theory
Kenji Kawaguchi
This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this the… (voir plus)ory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.
Towards Understanding Generalization via Analytical Learning Theory
Kenji Kawaguchi
This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this the… (voir plus)ory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.
Boundary Seeking GANs
Athul Jacob
Adam Trischler
Gerry Che
Kyunghyun Cho
Boundary Seeking GANs
Athul Jacob
Adam Trischler
Gerry Che
Kyunghyun Cho
Generative adversarial networks are a learning framework that rely on training a discriminator to estimate a measure of difference between a… (voir plus) target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator. The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs). We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation. In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning.
Combining Model-based and Model-free RL via Multi-step Control Variates
Tong Che
Yuchen Lu
George Tucker
Surya Bhupatiraju
Shane Gu
Sergey Levine
Existence of Nash Equilibria on Integer Programming Games
Andrea Lodi
João Pedro Pedroso