Interpolation Consistency Training for Semi-Supervised Learning
Vikas Verma
Kenji Kawaguchi
Alex Lamb
Juho Kannala
David Lopez-Paz
Arno Solin
Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment
Adrien Ali Taiga
William Fedus
Marlos C. Machado
This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE). We st… (see more)udy the use of different reward bonuses that incentives exploration in reinforcement learning. We do so by fixing the learning algorithm used and focusing only on the impact of the different exploration bonuses in the agent's performance. We use Rainbow, the state-of-the-art algorithm for value-based agents, and focus on some of the bonuses proposed in the last few years. We consider the impact these algorithms have on performance within the popular game Montezuma's Revenge which has gathered a lot of interest from the exploration community, across the the set of seven games identified by Bellemare et al. (2016) as challenging for exploration, and easier games where exploration is not an issue. We find that, in our setting, recently developed bonuses do not provide significantly improved performance on Montezuma's Revenge or hard exploration games. We also find that existing bonus-based methods may negatively impact performance on games in which exploration is not an issue and may even perform worse than
A principled approach for generating adversarial images under non-smooth dissimilarity metrics
Aram-Alexandre Pooladian
Chris J. Finlay
Tim Hoheisel
Adam M. Oberman
Deep neural networks perform well on real world data but are prone to adversarial perturbations: small changes in the input easily lead to m… (see more)isclassification. In this work, we propose an attack methodology not only for cases where the perturbations are measured by
An Empirical Study of Batch Normalization and Group Normalization in Conditional Computation
Vincent Michalski
Vikram Voleti
Anthony Ortiz
Batch normalization has been widely used to improve optimization in deep neural networks. While the uncertainty in batch statistics can act … (see more)as a regularizer, using these dataset statistics specific to the training set impairs generalization in certain tasks. Recently, alternative methods for normalizing feature activations in neural networks have been proposed. Among them, group normalization has been shown to yield similar, in some domains even superior performance to batch normalization. All these methods utilize a learned affine transformation after the normalization operation to increase representational power. Methods used in conditional computation define the parameters of these transformations as learnable functions of conditioning information. In this work, we study whether and where the conditional formulation of group normalization can improve generalization compared to conditional batch normalization. We evaluate performances on the tasks of visual question answering, few-shot learning, and conditional image generation.
On the impressive performance of randomly weighted encoders in summarization tasks
Jonathan Pilault
Jaehong Park
In this work, we investigate the performance of untrained randomly initialized encoders in a general class of sequence to sequence models an… (see more)d compare their performance with that of fully-trained encoders on the task of abstractive summarization. We hypothesize that random projections of an input text have enough representational power to encode the hierarchical structure of sentences and semantics of documents. Using a trained decoder to produce abstractive text summaries, we empirically demonstrate that architectures with untrained randomly initialized encoders perform competitively with respect to the equivalent architectures with fully-trained encoders. We further find that the capacity of the encoder not only improves overall model generalization but also closes the performance gap between untrained randomly initialized and full-trained encoders. To our knowledge, it is the first time that general sequence to sequence models with attention are assessed for trained and randomly projected representations on abstractive summarization.
Combined Reinforcement Learning via Abstract Representations
Vincent Francois-Lavet
In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages. In this p… (see more)aper we propose a new way of explicitly bridging both approaches via a shared low-dimensional learned encoding of the environment, meant to capture summarizing abstractions. We show that the modularity brought by this approach leads to good generalization while being computationally efficient, with planning happening in a smaller latent state space. In addition, this approach recovers a sufficient low-dimensional representation of the environment, which opens up new strategies for interpretable AI, exploration and transfer learning.
A Comparative Analysis of Expected and Distributional Reinforcement Learning
Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results re… (see more)lative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.
Contextualized Non-local Neural Networks for Sequence Learning
Pengfei Liu
Shuaichen Chang
Xuanjing Huang
Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which selfattention, as exemplified by… (see more) the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this paper, we propose an approach that combines and draws on the complementary strengths of these two methods. Specifically, we propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighbourhood.Experimental results on ten NLP tasks in text classification, semantic matching, and sequence labelling show that our proposed model outperforms competitive baselines and discovers task-specific dependency structures, thus providing better interpretability to users.
Generating Character Descriptions for Automatic Summarization of Fiction
Weiwei Zhang
J. Oren
Summaries of fictional stories allow readers to quickly decide whether or not a story catches their interest. A major challenge in automatic… (see more) summarization of fiction is the lack of standardized evaluation methodology or high-quality datasets for experimentation. In this work, we take a bottomup approach to this problem by assuming that story authors are uniquely qualified to inform such decisions. We collect a dataset of one million fiction stories with accompanying author-written summaries from Wattpad, an online story sharing platform. We identify commonly occurring summary components, of which a description of the main characters is the most frequent, and elicit descriptions of main characters directly from the authors for a sample of the stories. We propose two approaches to generate character descriptions, one based on ranking attributes found in the story text, the other based on classifying into a list of pre-defined attributes. We find that the classification-based approach performs the best in predicting character descriptions.
Learning Multi-Task Communication with Message Passing for Sequence Learning
Pengfei Liu
Jie Fu
Yue Dong
Xipeng Qiu
We present two architectures for multi-task learning with neural sequence models. Our approach allows the relationships between different ta… (see more)sks to be learned dynamically, rather than using an ad-hoc pre-defined structure as in previous work. We adopt the idea from message-passing graph neural networks, and propose a general graph multi-task learning framework in which different tasks can communicate with each other in an effective and interpretable way. We conduct extensive experiments in text classification and sequence labelling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines, but also learn interpretable and transferable patterns across tasks.
Learning Options with Interest Functions
Learning temporal abstractions which are partial solutions to a task and could be reused for solving other tasks is an ingredient that can h… (see more)elp agents to plan and learn efficiently. In this work, we tackle this problem in the options framework. We aim to autonomously learn options which are specialized in different state space regions by proposing a notion of interest functions, which generalizes initiation sets from the options framework for function approximation. We build on the option-critic framework to derive policy gradient theorems for interest functions, leading to a new interest-option-critic architecture.
Leveraging Observations in Bandits: Between Risks and Benefits
Andrei-Stefan Lupu
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to leverage existing data from experts. Allo… (see more)wing an agent to be influenced by external observations can benefit to the learning process, but it also puts the agent at risk of following sub-optimal behaviours. In this paper, we study this problem in the context of bandits. More specifically, we consider that an agent (learner) is interacting with a bandit-style decision task, but can also observe a target policy interacting with the same environment. The learner observes only the target’s actions, not the rewards obtained. We introduce a new bandit optimism modifier that uses conditional optimism contingent on the actions of the target in order to guide the agent’s exploration. We analyze the effect of this modification on the well-known Upper Confidence Bound algorithm by proving that it preserves a regret upper-bound of order O(lnT), even in the presence of a very poor target, and we derive the dependency of the expected regret on the general target policy. We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application.