Publications

Language GANs Falling Short

Massimo Caccia

Lucas Caccia

William Fedus

Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Lik… (voir plus)elihood (MLE) models trained with teacher forcing have consistently been reported as weak baselines, where poor performance is attributed to exposure bias (Bengio et al., 2015; Ranzato et al., 2015); at inference time, the model is fed its own prediction instead of a ground-truth token, which can lead to accumulating errors and poor samples. This line of reasoning has led to an outbreak of adversarial based approaches for NLG, on the account that GANs do not suffer from exposure bias. In this work, we make several surprising observations which contradict common beliefs. First, we revisit the canonical evaluation framework for NLG, and point out fundamental flaws with quality-only evaluation: we show that one can outperform such metrics using a simple, well-known temperature parameter to artificially reduce the entropy of the model's conditional distributions. Second, we leverage the control over the quality / diversity trade-off given by this parameter to evaluate models over the whole quality-diversity spectrum and find MLE models constantly outperform the proposed GAN variants over the whole quality-diversity space. Our results have several implications: 1) The impact of exposure bias on sample quality is less severe than previously thought, 2) temperature tuning provides a better quality / diversity trade-off than adversarial training while being easier to train, easier to cross-validate, and less computationally expensive. Code to reproduce the experiments is available at github.com/pclucas14/GansFallingShort

2020-01-01

ICLR (publié)

openreview.net

Learning Graph Structure With A Finite-State Automaton Layer

Daniel D. Johnson

Hugo Larochelle

Danny Tarlow

arxiv.org

Measuring Systematic Generalization in Neural Proof Generation with Transformers

Nicolas Gontier

Koustuv Sinha

Siva Reddy

Chris Pal

We are interested in understanding how well Transformer language models (TLMs) can perform reasoning tasks when trained on knowledge encoded… (voir plus) in the form of natural language. We investigate their systematic generalization abilities on a logical reasoning task in natural language, which involves reasoning over relationships between entities grounded in first-order logical proofs. Specifically, we perform soft theorem-proving by leveraging TLMs to generate natural language proofs. We test the generated proofs for logical consistency, along with the accuracy of the final inference. We observe length-generalization issues when evaluated on longer-than-trained sequences. However, we observe TLMs improve their generalization performance after being exposed to longer, exhaustive proofs. In addition, we discover that TLMs are able to generalize better using backward-chaining proofs compared to their forward-chaining counterparts, while they find it easier to generate forward chaining proofs. We observe that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs. This suggests that Transformers have efficient internal reasoning strategies that are harder to interpret. These results highlight the systematic generalization behavior of TLMs in the context of logical reasoning, and we believe this work motivates deeper inspection of their underlying reasoning strategies.

arxiv.org

Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Baihan Lin

Guillermo Cecchi

Djallel Bouneffouf

Jenna Reinen

Irina Rish

2020-01-01

HBAI@IJCAI (publié)

doi.org

Natural Language Processing and Text Mining with Graph-Structured Representations

Bang Liu

2020-01-01

(publié)

doi.org

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

Boris Oreshkin

Dmitri Carpov

Nicolas Chapados

Yoshua Bengio

We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based o… (voir plus)n backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy.

2020-01-01

ICLR (publié)

openreview.net

Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning.

Massimo Caccia

Pau Rodriguez

Oleksiy Ostapenko

Fabrice Normandin

Min Lin

Lucas Caccia

Issam Hadj Laradji

Irina Rish

Alexandre Lacoste

David Vazquez

Laurent Charlin

An operator view of policy gradient methods

Dibya Ghosh

Marlos C. Machado

Nicolas Le Roux

We cast policy gradient methods as the repeated application of two operators: a policy improvement operator …

arxiv.org

Practical Dynamic SC-Flip Polar Decoders: Algorithm and Implementation

Furkan Ercan

Thibaud Tonnellier

Nghia Doan

Warren Gross

SC-Flip (SCF) is a low-complexity polar code decoding algorithm with improved performance, and is an alternative to high-complexity (CRC)-ai… (voir plus)ded SC-List (CA-SCL) decoding. However, the performance improvement of SCF is limited since it can correct up to only one channel error (

2020-01-01

IEEE Transactions on Signal Processing (publié)

doi.org

arxiv.org

Principal Neighbourhood Aggregation for Graph Nets

Gabriele Corso

Luca Cavalleri

Dominique Beaini

Pietro Lio

Petar Veličković

arxiv.org

G RADIENT -B ASED N EURAL DAG L EARNING WITH I NTERVENTIONS

Philippe Brouillard

Alexandre Drouin

Sébastien Lachapelle

Alexandre Lacoste

Simon Lacoste-Julien

Decision making based on statistical association alone can be a dangerous endeavor due to non-causal associations. Ideally, one would rely o… (voir plus)n causal relationships that enable reasoning about the effect of interventions. Several methods have been proposed to discover such relationships from observational and inter-ventional data. Among them, GraN-DAG, a method that relies on the constrained optimization of neural networks, was shown to produce state-of-the-art results among algorithms relying purely on observational data. However, it is limited to observational data and cannot make use of interventions. In this work, we extend GraN-DAG to support interventional data and show that this improves its ability to infer causal structures

2020-01-01

(publié)

www.semanticscholar.org

In Search of Robust Measures of Generalization

Gintare Karolina Dziugaite

Alexandre Drouin

Brady Neal

Nitarshan Rajkumar

Ethan Caballero

Linbo Wang

Ioannis Mitliagkas

Daniel M. Roy

One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now tra… (voir plus)ins networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically. Jiang et al. (2020) recently described a large-scale empirical study aimed at uncovering potential causal relationships between bounds/measures and generalization. Building on their study, we highlight where their proposed methods can obscure failures and successes of generalization measures in explaining generalization. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.

arxiv.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications