Publications

An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
Felipe Petroski Such
Vashisht Madhavan
Rosanne Liu
Rui Wang
Yulun Li
Jiale Zhi
Ludwig Schubert
Jeff Clune
Joel Lehman
Much human and computational effort has aimed to improve how deep reinforcement learning (DRL) algorithms perform on benchmarks such as the … (voir plus)Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of DRL algorithms. Sources of friction include the onerous computational requirements, and general logistical and architectural complications for running DRL algorithms at scale. We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous DRL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models. This paper introduces the Atari Zoo framework, which contains models trained across benchmark Atari games, in an easy-to-use format, as well as code that implements common modes of analysis and connects such models to a popular neural network visualization library. Further, to demonstrate the potential of this dataset and software package, we show initial quantitative and qualitative comparisons between the performance and representations of several DRL algorithms, highlighting interesting and previously unknown distinctions between them.
Interpolation Consistency Training for Semi-Supervised Learning
Vikas Verma
Kenji Kawaguchi
Alex Lamb
Juho Kannala
David Lopez-Paz
Arno Solin
A principled approach for generating adversarial images under non-smooth dissimilarity metrics
Aram-Alexandre Pooladian
Chris J. Finlay
Tim Hoheisel
Adam M. Oberman
Deep neural networks perform well on real world data but are prone to adversarial perturbations: small changes in the input easily lead to m… (voir plus)isclassification. In this work, we propose an attack methodology not only for cases where the perturbations are measured by
A Comparative Analysis of Expected and Distributional Reinforcement Learning
Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results re… (voir plus)lative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.
Contextualized Non-local Neural Networks for Sequence Learning
Pengfei Liu
Shuaichen Chang
Xuanjing Huang
Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which selfattention, as exemplified by… (voir plus) the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this paper, we propose an approach that combines and draws on the complementary strengths of these two methods. Specifically, we propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighbourhood.Experimental results on ten NLP tasks in text classification, semantic matching, and sequence labelling show that our proposed model outperforms competitive baselines and discovers task-specific dependency structures, thus providing better interpretability to users.
Generating Character Descriptions for Automatic Summarization of Fiction
Weiwei Zhang
J. Oren
Summaries of fictional stories allow readers to quickly decide whether or not a story catches their interest. A major challenge in automatic… (voir plus) summarization of fiction is the lack of standardized evaluation methodology or high-quality datasets for experimentation. In this work, we take a bottomup approach to this problem by assuming that story authors are uniquely qualified to inform such decisions. We collect a dataset of one million fiction stories with accompanying author-written summaries from Wattpad, an online story sharing platform. We identify commonly occurring summary components, of which a description of the main characters is the most frequent, and elicit descriptions of main characters directly from the authors for a sample of the stories. We propose two approaches to generate character descriptions, one based on ranking attributes found in the story text, the other based on classifying into a list of pre-defined attributes. We find that the classification-based approach performs the best in predicting character descriptions.
Learning Multi-Task Communication with Message Passing for Sequence Learning
Pengfei Liu
Jie Fu
Yue Dong
Xipeng Qiu
We present two architectures for multi-task learning with neural sequence models. Our approach allows the relationships between different ta… (voir plus)sks to be learned dynamically, rather than using an ad-hoc pre-defined structure as in previous work. We adopt the idea from message-passing graph neural networks, and propose a general graph multi-task learning framework in which different tasks can communicate with each other in an effective and interpretable way. We conduct extensive experiments in text classification and sequence labelling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines, but also learn interpretable and transferable patterns across tasks.
Learning Options with Interest Functions
Learning temporal abstractions which are partial solutions to a task and could be reused for solving other tasks is an ingredient that can h… (voir plus)elp agents to plan and learn efficiently. In this work, we tackle this problem in the options framework. We aim to autonomously learn options which are specialized in different state space regions by proposing a notion of interest functions, which generalizes initiation sets from the options framework for function approximation. We build on the option-critic framework to derive policy gradient theorems for interest functions, leading to a new interest-option-critic architecture.
Leveraging Observations in Bandits: Between Risks and Benefits
Andrei-stefan Lupu
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to leverage existing data from experts. Allo… (voir plus)wing an agent to be influenced by external observations can benefit to the learning process, but it also puts the agent at risk of following sub-optimal behaviours. In this paper, we study this problem in the context of bandits. More specifically, we consider that an agent (learner) is interacting with a bandit-style decision task, but can also observe a target policy interacting with the same environment. The learner observes only the target’s actions, not the rewards obtained. We introduce a new bandit optimism modifier that uses conditional optimism contingent on the actions of the target in order to guide the agent’s exploration. We analyze the effect of this modification on the well-known Upper Confidence Bound algorithm by proving that it preserves a regret upper-bound of order O(lnT), even in the presence of a very poor target, and we derive the dependency of the expected regret on the general target policy. We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application.
Online Adaptative Curriculum Learning for GANs
Thang Doan
Joao Monteiro
Isabela Albuquerque
Bogdan Mazoure
Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open … (voir plus)questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen as a one-student/multiple-teachers setting. We formalize this problem within the full-information adversarial bandit framework, where we evaluate the capability of an algorithm to select mixtures of discriminators for providing the generator with feedback during learning. To this end, we propose a reward function which reflects the progress made by the generator and dynamically update the mixture weights allocated to each discriminator. We also draw connections between our algorithm and stochastic optimization methods and then show that existing approaches using multiple discriminators in literature can be recovered from our framework. We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support. On the other hand, highly expressive discriminators ensure samples quality. Finally, experimental results show that our approach improves samples quality and diversity over existing baselines by effectively learning a curriculum. These results also support the claim that weaker discriminators have higher entropy improving modes coverage.
A Cross-Domain Transferable Neural Coherence Model
Peng Xu
H. Saghir
Jin Sung Kang
Teng Long
Avishek Joey Bose
Yanshuai Cao
Coherence is an important aspect of text quality and is crucial for ensuring its readability. One important limitation of existing coherence… (voir plus) models is that training on one domain does not easily generalize to unseen categories of text. Previous work advocates for generative models for cross-domain generalization, because for discriminative models, the space of incoherent sentence orderings to discriminate against during training is prohibitively large. In this work, we propose a local discriminative neural model with a much smaller negative sampling space that can efficiently learn against incorrect orderings. The proposed coherence model is simple in structure, yet it significantly outperforms previous state-of-art methods on a standard benchmark dataset on the Wall Street Journal corpus, as well as in multiple new challenging settings of transfer to unseen categories of discourse on Wikipedia articles.
EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing
Yue Dong
Zichao Li
Mehdi Rezagholizadeh
We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-inte… (voir plus)rpreter approach. Most current neural sentence simplification systems are variants of sequence-to-sequence models adopted from machine translation. These methods learn to simplify sentences as a byproduct of the fact that they are trained on complex-simple sentence pairs. By contrast, our neural programmer-interpreter is directly trained to predict explicit edit operations on targeted parts of the input sentence, resembling the way that humans perform simplification and revision. Our model outperforms previous state-of-the-art neural sentence simplification models (without external knowledge) by large margins on three benchmark text simplification corpora in terms of SARI (+0.95 WikiLarge, +1.89 WikiSmall, +1.41 Newsela), and is judged by humans to produce overall better and simpler output sentences.