Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond
What Makes Machine Reading Comprehension Questions Difficult? Investigating Variation in Passage Sources and Question Types
Susan Bartlett
Grzegorz Kondrak
Max Bartolo
Alastair Roberts
Johannes Welbl
Steven Bird
Ewan Klein
Edward Loper
Samuel R. Bowman
George Dahl. 2021
What
Chao Pang
Junyuan Shang
Jiaxiang Liu
Xuyi Chen
Yanbin Zhao
Yuxiang Lu
Weixin Liu
Zhi-901 hua Wu
Weibao Gong … (see 21 more)
Jianzhong Liang
Zhizhou Shang
Peng Sun
Ouyang Xuan
Dianhai
Hao Tian
Hua Wu
Haifeng Wang
Adam Trischler
Tong Wang
Xingdi Yuan
Justin Har-908
Philip Bachman
Adina Williams
Nikita Nangia
Zhilin Yang
Peng Qi
Saizheng Zhang
ing. In
For a natural language understanding bench-001 mark to be useful in research, it has to con-002 sist of examples that are diverse and diffi… (see more)-003 cult enough to discriminate among current and 004 near-future state-of-the-art systems. However, 005 we do not yet know how best to select pas-006 sages to collect a variety of challenging exam-007 ples. In this study, we crowdsource multiple-008 choice reading comprehension questions for 009 passages taken from seven qualitatively dis-010 tinct sources, analyzing what attributes of pas-011 sages contribute to the difficulty and question 012 types of the collected examples. To our sur-013 prise, we find that passage source, length, and 014 readability measures do not significantly affect 015 question difficulty. Through our manual anno-016 tation of seven reasoning types, we observe 017 several trends between passage sources and 018 reasoning types, e.g., logical reasoning is more 019 often required in questions written for techni-020 cal passages. These results suggest that when 021 creating a new benchmark dataset, selecting a 022 diverse set of passages can help ensure a di-023 verse range of question types, but that passage 024 difficulty need not be a priority. 025
Bijective-Contrastive Estimation
In this work, we propose Bijective-Contrastive Estimation (BCE), a classification-based learning criterion for energy-based models. We gener… (see more)ate a collection of contrasting distributions using bijections, and solve all the classification problems between the original data distribution and the distributions induced by the bijections using a classifier parameterized by an energy model. We show that if the classification objective is minimized, the energy function will uniquely recover the data density up to a normalizing constant. This has the benefit of not having to explicitly specify a contrasting distribution, like noise contrastive estimation. Experimentally, we demonstrate that the proposed method works well on 2D synthetic datasets. We discuss the difficulty in high dimensional cases, and propose potential directions to explore for future work.
Machine Learning for Glacier Monitoring in the Hindu Kush Himalaya
Shimaa Baraka
Benjamin Akera
Bibek Aryal
Tenzing Chogyal Sherpa
Finu Shresta
Anthony Ortiz
Kris Sankaran
J. Ferres
M. Matin
RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design
Cheng-Hao Liu
Maksym Korablyov
Stanisław Jastrzębski
Paweł Włodarczyk-Pruszyński
Marwin Segler
De novo molecule generation often results in chemically unfeasible molecules. A natural idea to mitigate this problem is to bias the search … (see more)process towards more easily synthesizable molecules using a proxy for synthetic accessibility. However, using currently available proxies still results in highly unrealistic compounds. We investigate the feasibility of training deep graph neural networks to approximate the outputs of a retrosynthesis planning software, and their use to bias the search process. We evaluate our method on a benchmark involving searching for drug-like molecules with antibiotic properties. Compared to enumerating over five million existing molecules from the ZINC database, our approach finds molecules predicted to be more likely to be antibiotics while maintaining good drug-like properties and being easily synthesizable. Importantly, our deep neural network can successfully filter out hard to synthesize molecules while achieving a
AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation
Countering Language Drift with Seeded Iterated Learning
Yuchen Lu
Soumye Singhal
Florian Strub
Olivier Pietquin
Pretraining on human corpus and then finetuning in a simulator has become a standard pipeline for training a goal-oriented dialogue agent. N… (see more)evertheless, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we propose a generic approach to counter language drift called Seeded iterated learning (SIL). We periodically refine a pretrained student agent by imitating data sampled from a newly generated teacher agent. At each time step, the teacher is created by copying the student agent, before being finetuned to maximize task completion. SIL does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We evaluate SIL in a toy-setting Lewis Game, and then scale it up to the translation game with natural language. In both settings, SIL helps counter language drift as well as it improves the task completion compared to baselines.
Perceptual Generative Autoencoders
Zijun Zhang
Ruixiang ZHANG
Zongpeng Li
Modern generative models are usually designed to match target distributions directly in the data space, where the intrinsic dimension of dat… (see more)a can be much lower than the ambient dimension. We argue that this discrepancy may contribute to the difficulties in training generative models. We therefore propose to map both the generated and target distributions to a latent space using the encoder of a standard autoencoder, and train the generator (or decoder) to match the target distribution in the latent space. Specifically, we enforce the consistency in both the data space and the latent space with theoretically justified data and latent reconstruction losses. The resulting generative model, which we call a perceptual generative autoencoder (PGA), is then trained with a maximum likelihood or variational autoencoder (VAE) objective. With maximum likelihood, PGAs generalize the idea of reversible generative models to unrestricted neural network architectures and arbitrary number of latent dimensions. When combined with VAEs, PGAs substantially improve over the baseline VAEs in terms of sample quality. Compared to other autoencoder-based generative models using simple priors, PGAs achieve state-of-the-art FID scores on CIFAR-10 and CelebA.
DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning
Timo Milbich
Karsten Roth
Homanga Bharadhwaj
Samarth Sinha
Bjorn Ommer
Joseph Paul Cohen
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse D. Thomason
Jacob Andreas
Joyce Yue Chai
Mirella Lapata
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph Turian
Supervised Seeded Iterated Learning for Interactive Language Learning
Yuchen Lu
Soumye Singhal
Florian Strub
Olivier Pietquin
The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach
Iulian V. Serban
Chinnadhurai Sankar
Michael Pieper
Deep reinforcement learning has recently shown many impressive successes. However, one major obstacle towards applying such methods to real-… (see more)world problems is their lack of data-efficiency. To this end, we propose the Bottleneck Simulator: a model-based reinforcement learning method which combines a learned, factorized transition model of the environment with rollout simulations to learn an effective policy from few examples. The learned transition model employs an abstract, discrete (bottleneck) state, which increases sample efficiency by reducing the number of model parameters and by exploiting structural properties of the environment. We provide a mathematical analysis of the Bottleneck Simulator in terms of fixed points of the learned policy, which reveals how performance is affected by four distinct sources of error: an error related to the abstract space structure, an error related to the transition model estimation variance, an error related to the transition model estimation bias, and an error related to the transition model class bias. Finally, we evaluate the Bottleneck Simulator on two natural language processing tasks: a text adventure game and a real-world, complex dialogue response selection task. On both tasks, the Bottleneck Simulator yields excellent performance beating competing approaches.