Natural Language Processing and Text Mining with Graph-Structured Representations
N-BEATS: Neural basis expansion analysis for interpretable time series forecasting
Boris Oreshkin
Dmitri Carpov
We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based o… (see more)n backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy.
Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning.
Massimo Caccia
Pau Rodriguez
Oleksiy Ostapenko
Fabrice Normandin
Min Lin
Lucas Caccia
Issam Hadj Laradji
Alexandre Lacoste
David Vazquez
An operator view of policy gradient methods
Dibya Ghosh
Marlos C. Machado
We cast policy gradient methods as the repeated application of two operators: a policy improvement operator …
PAST DSAA KEYNOTE SPEAKERS
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. By utilizing fas… (see more)t matrix block-approximation techniques, we propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions, while being able to meaningfully model local information of the graph (e.g., degrees) as well as global information (e.g., clustering coefficient, assortativity, etc.) if desired. This allows one to efficiently generate random networks with similar properties as an observed network, and the models can be used for several downstream tasks such as link prediction. Our methods are scalable to sparse graphs consisting of millions of nodes. Empirical evaluation demonstrates competitiveness in terms of both speed and accuracy with state-of-the-art methods—which are typically based on embedding the graph into some lowdimensional space— for link prediction, showcasing the potential of a more direct and interpretable probablistic model for this task.
Practical Dynamic SC-Flip Polar Decoders: Algorithm and Implementation
Furkan Ercan
Thibaud Tonnellier
Nghia Doan
SC-Flip (SCF) is a low-complexity polar code decoding algorithm with improved performance, and is an alternative to high-complexity (CRC)-ai… (see more)ded SC-List (CA-SCL) decoding. However, the performance improvement of SCF is limited since it can correct up to only one channel error (
Principal Neighbourhood Aggregation for Graph Nets
Gabriele Corso
Luca Cavalleri
Pietro Lio
Petar Veličković
G RADIENT -B ASED N EURAL DAG L EARNING WITH I NTERVENTIONS
Philippe Brouillard
Sébastien Lachapelle
Alexandre Lacoste
Decision making based on statistical association alone can be a dangerous endeavor due to non-causal associations. Ideally, one would rely o… (see more)n causal relationships that enable reasoning about the effect of interventions. Several methods have been proposed to discover such relationships from observational and inter-ventional data. Among them, GraN-DAG, a method that relies on the constrained optimization of neural networks, was shown to produce state-of-the-art results among algorithms relying purely on observational data. However, it is limited to observational data and cannot make use of interventions. In this work, we extend GraN-DAG to support interventional data and show that this improves its ability to infer causal structures
Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives
Anirudh Goyal
Shagun Sodhani
Jonathan Binas
Xue Bin Peng
Sergey Levine
Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavi… (see more)or. Often, this is addressed in the context of hierarchical reinforcement learning, where the aim is to decompose a policy into lower-level primitives or options, and a higher-level meta-policy that triggers the appropriate behaviors for a given situation. However, the meta-policy must still produce appropriate decisions in all states. In this work, we propose a policy design that decomposes into primitives, similarly to hierarchical reinforcement learning, but without a high-level meta-policy. Instead, each primitive can decide for themselves whether they wish to act in the current state. We use an information-theoretic mechanism for enabling this decentralized decision: each primitive chooses how much information it needs about the current state to make a decision and the primitive that requests the most information about the current state acts in the world. The primitives are regularized to use as little information as possible, which leads to natural competition and specialization. We experimentally demonstrate that this policy architecture improves over both flat and hierarchical policies in terms of generalization.
In Search of Robust Measures of Generalization
Brady Neal
Nitarshan Rajkumar
Ethan Caballero
Linbo Wang
Daniel M. Roy
One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now tra… (see more)ins networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically. Jiang et al. (2020) recently described a large-scale empirical study aimed at uncovering potential causal relationships between bounds/measures and generalization. Building on their study, we highlight where their proposed methods can obscure failures and successes of generalization measures in explaining generalization. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
SENET: A Semantic Web for Supporting Automation of Software Engineering Tasks
Yalin Liu
Jinfeng Lin
Jane Cleland-Huang
Michael Vierhauser
Sugandha Lohar
The use of Natural Language (NL) interfaces to allow devices and applications to respond to verbal commands or free-form textual queries is … (see more)becoming increasingly prevalent in our society. To a large extent, their success in interpreting and responding to a request is dependent upon rich underlying ontologies and conceptual models that understand the technical or domain specific vocabulary of diverse users. The effective use of NL interfaces in the Software Engineering (SE) domains requires its own ontology models focusing upon software related terms and concepts. While many SE glossaries exist, they are often incomplete and tend to define the vocabulary for specific sub-fields without capturing associations between terms and phrases. This limits their usefulness for supporting NL-related tasks. In this paper we propose an approach for constructing and evolving a semantic network of software engineering concepts and phrases. Our approach starts with a set of existing SE glossaries, uses the existing glossary terms and explicitly defined associations as a starting point, uses machine learning-based techniques to dynamically identify and document additional associations between terms, leverages the network to interpret NL queries in the SE domain, and finally augments the resulting semantic network with feedback provided by users. We evaluate the viability of our approach within the sub-domain of Agile Software Development, focusing on requirements related queries, and show that the semantic network enhances the ability of an NL interface to correctly interpret and execute user queries.
Small-GAN: Speeding Up GAN Training Using Core-sets
Samarth Sinha
Han Zhang
Anirudh Goyal
Augustus Odena
Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch siz… (see more)es. Unfortunately, using large batches is slow and expensive on conventional hardware. Thus, it would be nice if we could generate batches that were effectively large though actually small. In this work, we propose a method to do this, inspired by the use of Coreset-selection in active learning. When training a GAN, we draw a large batch of samples from the prior and then compress that batch using Coreset-selection. To create effectively large batches of 'real' images, we create a cached dataset of Inception activations of each training image, randomly project them down to a smaller dimension, and then use Coreset-selection on those projected activations at training time. We conduct experiments showing that this technique substantially reduces training time and memory usage for modern GAN variants, that it reduces the fraction of dropped modes in a synthetic dataset, and that it allows GANs to reach a new state of the art in anomaly detection.