Alessandro Sordoni

Peng Li

Jie Zhou

Aaron Courville

2022-04-30

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (publié)

Combining Modular Skills in Multitask Learning

Edoardo M. Ponti

Siva Reddy

A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to ne… (voir plus)w tasks. In this work, we assume that each task is associated with a subset of latent discrete skills from a (potentially small) inventory. In turn, skills correspond to parameter-efficient (sparse / low-rank) model parameterisations. By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills. To favour non-trivial soft partitions of skills across tasks, we experiment with a series of inductive biases, such as an Indian Buffet Process prior and a two-speed learning rate. We evaluate our latent-skill model on two main settings: 1) multitask reinforcement learning for grounded instruction following on 8 levels of the BabyAI platform; and 2) few-shot adaptation of pre-trained text-to-text generative models on CrossFit, a benchmark comprising 160 NLP tasks. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning, compared to baselines with fully shared, task-specific, or conditionally generated parameters where knowledge is entangled across tasks. In addition, we show how discrete skills help interpretability, as they yield an explicit hierarchy of tasks.

2022-02-27

ArXiv (prépublication)

Learning to Dequantise with Truncated Flows

Dequantisation is a general technique used for transforming data described by a discrete random variable x into a continuous (latent) random… (voir plus) variable z, for the purpose of it being modeled by likelihood-based density models. Dequantisation was first introduced in the context of ordinal data, such as image pixel values. However, when the data is categorical, the dequantisation scheme is not obvious. We learn such a dequantisation scheme q(z|x), using variational inference with TRUncated FLows (TRUFL) — a novel flow-based model that allows the dequantiser to have a learnable truncated support. Unlike previous work, the TRUFL dequantiser is (i) capable of embedding the data losslessly in certain cases, since the truncation allows the conditional distributions q(z|x) to have non-overlapping bounded supports, while being (ii) trainable with back-propagation. Addtionally, since the support of the marginal q(z) is bounded and the support of prior p(z) is not, we propose to renormalise the prior distribution over the support of q(z). We derive a lower bound for training, and propose a rejection sampling scheme to account for the invalid samples. Experimentally, we benchmark TRUFL on constrained generation tasks, and find that it outperforms prior approaches. In addition, we find that rejection sampling results in higher validity for the constrained problems.

2021-12-31

International Conference on Learning Representations (publié)

openreview.net

Towards Policy-Guided Conversational Recommendation with Dialogue Acts

Paul Crook

Y-Lan Boureau

J. Weston

Akbar Karimi

Leonardo Rossi

Andrea Prati

Wenqiang Lei

Xiangnan He

Qingyun Yisong Miao

Richang Wu

Min-Yen Hong

Kan Tat-Seng

Raymond Li

Samira Ebrahimi Kahou

Hannes Schulz

Zujie Liang

Huang Hu

Can Xu

Jian Miao

Lizi Liao … (voir 47 de plus)

Ryuichi Takanobu

Yunshan Ma

Xun Yang

Wenchang Ma

Minlie Huang

Minghao Tu

Iulian Serban

Aaron C. Courville

David Silver

Julian Schrittwieser

K. Simonyan

Ioannis Antonoglou

Aja Huang

A. Guez

Hanlin Zhu

O. Vinyals

Igor Babuschkin

Junyoung Chung

M. Mathieu

Max Jaderberg

Wojciech M. Czar-725 necki

A. Dudzik

Petko Georgiev

Richard Powell

T. Ewalds

Dan Horgan

M. Kroiss

Ivo Danihelka

J. Agapiou

Junhyuk Oh

Valentin Dalibard

David Choi

L. Sifre

Yury Sulsky

Sasha Vezhnevets

James Molloy

Trevor Cai

D. Budden

T. Paine

Caglar Gulçehre

Ziyu Wang

Tobias Pfaff

Tobias Pohlen

2021-12-31

(publié)

www.semanticscholar.org

Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle

Syntax is fundamental to our thinking about language. Failing to capture the structure of input language could lead to generalization proble… (voir plus)ms and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with an incremental parser and maintains the conditional probability setting of a standard language model (left-to-right). To train the incremental parser and avoid exposure bias, we also propose a novel dynamic oracle, so that SOM is more robust to wrong parsing decisions. Experiments show that SOM can achieve strong results in language modeling, incremental parsing and syntactic generalization tests, while using fewer parameters than other models.

2021-05-31

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (publié)

Understanding by Understanding Not: Modeling Negation in Language Models

R Devon Hjelm

Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language mode… (voir plus)ls often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the resulting combined objective we reduce the mean top~1 error rate to 4% on the negated LAMA dataset. We also see some improvements on the negated NLI benchmarks.

2021-05-31

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (publié)

What Makes Machine Reading Comprehension Questions Difﬁcult? Investigating Variation in Passage Sources and Question Types

Susan Bartlett

Grzegorz Kondrak

Max Bartolo

Alastair Roberts

Johannes Welbl

Steven Bird

Ewan Klein

Edward Loper

Samuel R. Bowman

George Dahl. 2021

What

Chao Pang

Junyuan Shang

Jiaxiang Liu

Xuyi Chen

Yanbin Zhao

Yuxiang Lu

Weixin Liu

Zhi-901 hua Wu

Weibao Gong … (voir 21 de plus)

Jianzhong Liang

Zhizhou Shang

Peng Sun

Ouyang Xuan

Dianhai

Houwen Tian

Hua Wu

Haifeng Wang

Adam Trischler

Tong Wang

Xingdi Yuan

Justin Har-908

Philip Bachman

Adina Williams

Nikita Nangia

Zhilin Yang

Peng Qi

Saizheng Zhang

Yoshua Bengio

ing. In

For a natural language understanding bench-001 mark to be useful in research, it has to con-002 sist of examples that are diverse and difﬁ… (voir plus)-003 cult enough to discriminate among current and 004 near-future state-of-the-art systems. However, 005 we do not yet know how best to select pas-006 sages to collect a variety of challenging exam-007 ples. In this study, we crowdsource multiple-008 choice reading comprehension questions for 009 passages taken from seven qualitatively dis-010 tinct sources, analyzing what attributes of pas-011 sages contribute to the difﬁculty and question 012 types of the collected examples. To our sur-013 prise, we ﬁnd that passage source, length, and 014 readability measures do not signiﬁcantly affect 015 question difﬁculty. Through our manual anno-016 tation of seven reasoning types, we observe 017 several trends between passage sources and 018 reasoning types, e.g., logical reasoning is more 019 often required in questions written for techni-020 cal passages. These results suggest that when 021 creating a new benchmark dataset, selecting a 022 diverse set of passages can help ensure a di-023 verse range of question types, but that passage 024 difﬁculty need not be a priority. 025

2020-12-31

(publié)

www.semanticscholar.org

Recursive Top-Down Production for Sentence Generation with Latent Trees

Timothy J. O'Donnell

We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic p… (voir plus)rogramming algorithm that marginalises over latent binary tree structures with

2020-10-31

Findings of the Association for Computational Linguistics: EMNLP 2020 (publié)

Explicitly Modeling Syntax in Language Model improves Generalization

Syntax is fundamental to our thinking about language. Although neural networks are very successful in many tasks, they do not explicitly mod… (voir plus)el syntactic structure. Failing to capture the structure of inputs could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with a one-step look-ahead parser and maintains the conditional probability setting of the standard language model. Experiments show that SOM can achieve strong results in language modeling and syntactic generalization tests, while using fewer parameters then other models.

2020-10-20

arXiv.org (prépublication)

Ordered Memory

Stack-augmented recurrent neural networks (RNNs) have been of interest to the deep learning community for some time. However, the difficulty… (voir plus) of training memory models remains a problem obstructing the widespread use of such models. In this paper, we propose the Ordered Memory architecture. Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory. We also introduce a new Gated Recursive Cell to compose lower-level representations into higher-level representation. We demonstrate that our model achieves strong performance on the logical inference task (Bowman et al., 2015)and the ListOps (Nangia and Bowman, 2018) task. We can also interpret the model to retrieve the induced tree structure, and find that these induced structures align with the ground truth. Finally, we evaluate our model on the Stanford SentimentTreebank tasks (Socher et al., 2013), and find that it performs comparatively with the state-of-the-art methods in the literature.

2019-10-28

ArXiv (prépublication)

Brief Report: Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Yikeng Shen

Shawn Tan

Aaron Courville

2019-05-05

(publié)

www.semanticscholar.org

An Empirical Study of Example Forgetting During Deep Neural Network Learning

Mariya Toneva

Remi Tachet des Combes

Adam Trischler

Yoshua Bengio

Geoffrey J. Gordon

Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single class… (voir plus)ification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a `forgetting event' to have occurred when an individual training example transitions from being classified correctly to incorrectly over the course of learning. Across several benchmark data sets, we find that: (i) certain examples are forgotten with high frequency, and some not at all; (ii) a data set's (un)forgettable examples generalize across neural architectures; and (iii) based on forgetting dynamics, a significant fraction of examples can be omitted from the training data set while still maintaining state-of-the-art generalization performance.

2018-12-31

ICLR.cc/2019/Conference (poster)