Publications

Generating Contradictory, Neutral, and Entailing Sentences

Yikang Shen

Shawn Tan

Chin-Wei Huang

Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to l… (voir plus)earn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on an input sentence and a logical inference label. We do this by modeling the different possibilities for the output sentence as a distribution over the latent representation, which we train using an adversarial objective. We evaluate the model using two state-of-the-art models for the Recognizing Textual Entailment (RTE) task, and measure the BLEU scores against the actual sentences as a probe for the diversity of sentences produced by our model. The experiment results show that, given our framework, we have clear ways to improve the quality and diversity of generated sentences.

2018-03-07

ArXiv (prépublication)

arxiv.org

A polynomial algorithm for a continuous bilevel knapsack problem

Margarida Carvalho

Andrea Lodi

Patrice Marcotte

2018-03-01

Operations Research Letters (publié)

doi.org

Learning Anonymized Representations with Adversarial Neural Networks

Clément Feutry

Pablo Piantanida

Yoshua Bengio

P. Duhamel

Statistical methods protecting sensitive information or the identity of the data owner have become critical to ensure privacy of individuals… (voir plus) as well as of organizations. This paper investigates anonymization methods based on representation learning and deep neural networks, and motivated by novel information theoretical bounds. We introduce a novel training objective for simultaneously training a predictor over target variables of interest (the regular labels) while preventing an intermediate representation to be predictive of the private labels. The architecture is based on three sub-networks: one going from input to representation, one from representation to predicted regular labels, and one from representation to predicted private labels. The training procedure aims at learning representations that preserve the relevant part of the information (about regular labels) while dismissing information about the private labels which correspond to the identity of a person. We demonstrate the success of this approach for two distinct classification versus anonymization tasks (handwritten digits and sentiment analysis).

2018-02-26

ArXiv (prépublication)

arxiv.org

A Walk with SGD

Chen Xing

Devansh Arpit

Christos Tsirigotis

Yoshua Bengio

2018-02-24

ArXiv (prépublication)

arxiv.org

A Walk with SGD

Chen Xing

Devansh Arpit

Christos Tsirigotis

Yoshua Bengio

2018-02-24

ArXiv (prépublication)

arxiv.org

A Walk with SGD

Chen Xing

Devansh Arpit

Christos Tsirigotis

Yoshua Bengio

Exploring why stochastic gradient descent (SGD) based optimization methods train deep neural networks (DNNs) that generalize well has become… (voir plus) an active area of research. Towards this end, we empirically study the dynamics of SGD when training over-parametrized DNNs. Specifically we study the DNN loss surface along the trajectory of SGD by interpolating the loss surface between parameters from consecutive \textit{iterations} and tracking various metrics during training. We find that the loss interpolation between parameters before and after a training update is roughly convex with a minimum (\textit{valley floor}) in between for most of the training. Based on this and other metrics, we deduce that during most of the training, SGD explores regions in a valley by bouncing off valley walls at a height above the valley floor. This 'bouncing off walls at a height' mechanism helps SGD traverse larger distance for small batch sizes and large learning rates which we find play qualitatively different roles in the dynamics. While a large learning rate maintains a large height from the valley floor, a small batch size injects noise facilitating exploration. We find this mechanism is crucial for generalization because the valley floor has barriers and this exploration above the valley floor allows SGD to quickly travel far away from the initialization point (without being affected by barriers) and find flatter regions, corresponding to better generalization.

2018-02-24

ArXiv (prépublication)

arxiv.org

Generalization in Machine Learning via Analytical Learning Theory

Kenji Kawaguchi

Yoshua Bengio

This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this the… (voir plus)ory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.

2018-02-21

arXiv.org (prépublication)

dblp.uni-trier.de

Towards Understanding Generalization via Analytical Learning Theory

Kenji Kawaguchi

Yoshua Bengio

This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this the… (voir plus)ory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.

2018-02-21

ArXiv (prépublication)

arxiv.org

Boundary Seeking GANs

(Rex) Devon Hjelm

Athul Jacob

Adam Trischler

Gerry Che

Kyunghyun Cho

Yoshua Bengio

2018-02-15

International Conference on Learning Representations (publié)

dblp.uni-trier.de

Boundary Seeking GANs

(Rex) Devon Hjelm

Athul Jacob

Adam Trischler

Gerry Che

Kyunghyun Cho

Yoshua Bengio

Generative adversarial networks are a learning framework that rely on training a discriminator to estimate a measure of difference between a… (voir plus) target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator. The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs). We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation. In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning.

2018-02-15

International Conference on Learning Representations (publié)

dblp.uni-trier.de

Combining Model-based and Model-free RL via Multi-step Control Variates

Tong Che

Yuchen Lu

George Tucker

Surya Bhupatiraju

Shane Gu

Sergey Levine

Yoshua Bengio

2018-02-15

(publié)

openreview.net

Existence of Nash Equilibria on Integer Programming Games

Margarida Carvalho

Andrea Lodi

João Pedro Pedroso

2018-02-15

Springer Proceedings in Mathematics & Statistics (publié)

doi.org

Développement du groupe d'experts de l'ONU sur l'IA

Bourse de recherche en politiques de l'IA de Mila

Développement du groupe d'experts de l'ONU sur l'IA

Bourse de recherche en politiques de l'IA de Mila

Publications

Développement du groupe d'experts de l'ONU sur l'IA

Bourse de recherche en politiques de l'IA de Mila

Développement du groupe d'experts de l'ONU sur l'IA

Bourse de recherche en politiques de l'IA de Mila

Mots-clés populaires:

Publications