Publications

Leveraging Observations in Bandits: Between Risks and Benefits

Andrei-Stefan Lupu

Imitation learning has been widely used to speed up learning in novice agents, by allowing them to leverage existing data from experts. Allo… (voir plus)wing an agent to be influenced by external observations can benefit to the learning process, but it also puts the agent at risk of following sub-optimal behaviours. In this paper, we study this problem in the context of bandits. More specifically, we consider that an agent (learner) is interacting with a bandit-style decision task, but can also observe a target policy interacting with the same environment. The learner observes only the target’s actions, not the rewards obtained. We introduce a new bandit optimism modifier that uses conditional optimism contingent on the actions of the target in order to guide the agent’s exploration. We analyze the effect of this modification on the well-known Upper Confidence Bound algorithm by proving that it preserves a regret upper-bound of order O(lnT), even in the presence of a very poor target, and we derive the dependency of the expected regret on the general target policy. We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application.

2019-07-17

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

doi.org

Online Adaptative Curriculum Learning for GANs

Thang Doan

Joao Monteiro

Isabela Albuquerque

Bogdan Mazoure

Audrey Durand

Joelle Pineau

(Rex) Devon Hjelm

Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open … (voir plus)questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen as a one-student/multiple-teachers setting. We formalize this problem within the full-information adversarial bandit framework, where we evaluate the capability of an algorithm to select mixtures of discriminators for providing the generator with feedback during learning. To this end, we propose a reward function which reflects the progress made by the generator and dynamically update the mixture weights allocated to each discriminator. We also draw connections between our algorithm and stochastic optimization methods and then show that existing approaches using multiple discriminators in literature can be recovered from our framework. We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support. On the other hand, highly expressive discriminators ensure samples quality. Finally, experimental results show that our approach improves samples quality and diversity over existing baselines by effectively learning a curriculum. These results also support the claim that weaker discriminators have higher entropy improving modes coverage.

2019-07-17

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

A Cross-Domain Transferable Neural Coherence Model

Peng Xu

H. Saghir

Jin Sung Kang

Teng Long

Avishek Joey Bose

Yanshuai Cao

Jackie Cheung

Coherence is an important aspect of text quality and is crucial for ensuring its readability. One important limitation of existing coherence… (voir plus) models is that training on one domain does not easily generalize to unseen categories of text. Previous work advocates for generative models for cross-domain generalization, because for discriminative models, the space of incoherent sentence orderings to discriminate against during training is prohibitively large. In this work, we propose a local discriminative neural model with a much smaller negative sampling space that can efficiently learn against incorrect orderings. The proposed coherence model is simple in structure, yet it significantly outperforms previous state-of-art methods on a standard benchmark dataset on the Wall Street Journal corpus, as well as in multiple new challenging settings of transfer to unseen categories of discourse on Wikipedia articles.

2019-07-01

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (publié)

doi.org

arxiv.org

EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing

Yue Dong

Zichao Li

Mehdi Rezagholizadeh

Jackie Cheung

We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-inte… (voir plus)rpreter approach. Most current neural sentence simplification systems are variants of sequence-to-sequence models adopted from machine translation. These methods learn to simplify sentences as a byproduct of the fact that they are trained on complex-simple sentence pairs. By contrast, our neural programmer-interpreter is directly trained to predict explicit edit operations on targeted parts of the input sentence, resembling the way that humans perform simplification and revision. Our model outperforms previous state-of-the-art neural sentence simplification models (without external knowledge) by large margins on three benchmark text simplification corpora in terms of SARI (+0.95 WikiLarge, +1.89 WikiSmall, +1.41 Newsela), and is judged by humans to produce overall better and simpler output sentences.

2019-07-01

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (publié)

doi.org

arxiv.org

GAIT: A Geometric Approach to Information Theory

Jose Gallego-Posada

Ankit Vani

Max Schwarzer

Simon Lacoste-Julien

We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities … (voir plus)between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our proposed divergence exhibits performance on par with state-of-the-art methods based on the Wasserstein distance, but enjoys a closed-form expression that can be computed efficiently. We demonstrate the versatility of our method via experiments on a broad range of domains: training generative models, computing image barycenters, approximating empirical measures and counting modes.

2019-06-19

ArXiv (preprint)

arxiv.org

On the interplay between noise and curvature and its effect on optimization and generalization

Valentin Thomas

Fabian Pedregosa

Bart van Merriënboer

Pierre-Antoine Mangazol

Yoshua Bengio

Nicolas Le Roux

The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the v… (voir plus)ariance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.

2019-06-19

ArXiv (prépublication)

arxiv.org

Information matrices and generalization

Valentin Thomas

Fabian Pedregosa

Bart van Merriënboer

Pierre-Antoine Manzagol

Yoshua Bengio

Nicolas Le Roux

This work revisits the use of information criteria to characterize the generalization of deep learning models. In particular, we empirically… (voir plus) demonstrate the effectiveness of the Takeuchi information criterion (TIC), an extension of the Akaike information criterion (AIC) for misspecified models, in estimating the generalization gap, shedding light on why quantities such as the number of parameters cannot quantify generalization. The TIC depends on both the Hessian of the loss H and the covariance of the gradients C. By exploring the similarities and differences between these two matrices as well as the Fisher information matrix F, we study the interplay between noise and curvature in deep models. We also address the question of whether C is a reasonable approximation to F, as is commonly assumed.

2019-06-18

arXiv.org (prépublication)

dblp.uni-trier.de

Information matrices and generalization

Valentin Thomas

Fabian Pedregosa

Bart van Merriënboer

Pierre-Antoine Manzagol

Yoshua Bengio

Nicolas Le Roux

This work revisits the use of information criteria to characterize the generalization of deep learning models. In particular, we empirically… (voir plus) demonstrate the effectiveness of the Takeuchi information criterion (TIC), an extension of the Akaike information criterion (AIC) for misspecified models, in estimating the generalization gap, shedding light on why quantities such as the number of parameters cannot quantify generalization. The TIC depends on both the Hessian of the loss H and the covariance of the gradients C. By exploring the similarities and differences between these two matrices as well as the Fisher information matrix F, we study the interplay between noise and curvature in deep models. We also address the question of whether C is a reasonable approximation to F, as is commonly assumed.

2019-06-18

arXiv.org (preprint)

dblp.uni-trier.de

Anomaly Detection with Joint Representation Learning of Content and Connection

Junhao Wang

Renhao Wang

Aayushi Kulshrestha

Reihaneh Rabbany

Social media sites are becoming a key factor in politics. These platforms are easy to manipulate for the purpose of distorting information s… (voir plus)pace to confuse and distract voters. Past works to identify disruptive patterns are mostly focused on analyzing the content of tweets. In this study, we jointly embed the information from both user posted content as well as a user's follower network, to detect groups of densely connected users in an unsupervised fashion. We then investigate these dense sub-blocks of users to flag anomalous behavior. In our experiments, we study the tweets related to the upcoming 2019 Canadian Elections, and observe a set of densely-connected users engaging in local politics in different provinces, and exhibiting troll-like behavior.

2019-06-16

ArXiv (prépublication)

arxiv.org

Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training

William Harvey

Michael Teng

Frank Wood

Hard visual attention is a promising approach to reduce the computational burden of modern computer vision methodologies. However, hard atte… (voir plus)ntion mechanisms can be difficult and slow to train, which is especially costly for applications like neural architecture search where multiple networks must be trained. We introduce a method to amortise the cost of training by generating an extra supervision signal for a subset of the training data. This supervision is in the form of sequences of ‘good’ locations to attend to for each image. We find that the best method to generate supervision sequences comes from framing hard attention for image classification as a Bayesian optimal experimental design (BOED) problem. From this perspective, the optimal locations to attend to are those which provide the greatest expected reduction in the entropy of the classification distribution. We introduce methodology from the BOED literature to approximate this optimal behaviour and generate ‘near-optimal’ supervision sequences. We then present a hard attention network training objective that makes use of these sequences and show that it allows faster training than prior work. We finally demonstrate the utility of faster hard attention training by incorporating supervision sequences in a neural architecture search, resulting in hard attention architectures which can outperform networks with access to the entire image.

2019-06-13

ArXiv (preprint)

doi.org

arxiv.org

Stochastic Neural Network with Kronecker Flow

Chin-Wei Huang

Ahmed Touati

Pascal Vincent

Gintare Karolina Dziugaite

Alexandre Lacoste

Aaron Courville

Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to… (voir plus) scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.

2019-06-10

ArXiv (preprint)

arxiv.org

Dissociating memory accessibility and precision in forgetting

S. Berens

Blake Richards

A. Horner

2019-06-04

Nature Human Behaviour (publié)

doi.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications