Publications

Natural Language Processing and Text Mining with Graph-Structured Representations
N-BEATS: Neural basis expansion analysis for interpretable time series forecasting
Boris Oreshkin
Dmitri Carpov
We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based o… (voir plus)n backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy.
Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning.
Massimo Caccia
Pau Rodriguez
Oleksiy Ostapenko
Fabrice Normandin
Min Lin
Lucas Caccia
Issam Hadj Laradji
Alexandre Lacoste
David Vazquez
An operator view of policy gradient methods
Dibya Ghosh
Marlos C. Machado
We cast policy gradient methods as the repeated application of two operators: a policy improvement operator …
Practical Dynamic SC-Flip Polar Decoders: Algorithm and Implementation
Furkan Ercan
Thibaud Tonnellier
Nghia Doan
SC-Flip (SCF) is a low-complexity polar code decoding algorithm with improved performance, and is an alternative to high-complexity (CRC)-ai… (voir plus)ded SC-List (CA-SCL) decoding. However, the performance improvement of SCF is limited since it can correct up to only one channel error (
Principal Neighbourhood Aggregation for Graph Nets
Gabriele Corso
Luca Cavalleri
Pietro Lio
Petar Veličković
G RADIENT -B ASED N EURAL DAG L EARNING WITH I NTERVENTIONS
Philippe Brouillard
Sébastien Lachapelle
Alexandre Lacoste
Decision making based on statistical association alone can be a dangerous endeavor due to non-causal associations. Ideally, one would rely o… (voir plus)n causal relationships that enable reasoning about the effect of interventions. Several methods have been proposed to discover such relationships from observational and inter-ventional data. Among them, GraN-DAG, a method that relies on the constrained optimization of neural networks, was shown to produce state-of-the-art results among algorithms relying purely on observational data. However, it is limited to observational data and cannot make use of interventions. In this work, we extend GraN-DAG to support interventional data and show that this improves its ability to infer causal structures
In Search of Robust Measures of Generalization
Brady Neal
Nitarshan Rajkumar
Ethan Caballero
Linbo Wang
Daniel M. Roy
One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now tra… (voir plus)ins networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically. Jiang et al. (2020) recently described a large-scale empirical study aimed at uncovering potential causal relationships between bounds/measures and generalization. Building on their study, we highlight where their proposed methods can obscure failures and successes of generalization measures in explaining generalization. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
SENET: A Semantic Web for Supporting Automation of Software Engineering Tasks
Yalin Liu
Jinfeng Lin
Jane Cleland-Huang
Michael Vierhauser
Sugandha Lohar
The use of Natural Language (NL) interfaces to allow devices and applications to respond to verbal commands or free-form textual queries is … (voir plus)becoming increasingly prevalent in our society. To a large extent, their success in interpreting and responding to a request is dependent upon rich underlying ontologies and conceptual models that understand the technical or domain specific vocabulary of diverse users. The effective use of NL interfaces in the Software Engineering (SE) domains requires its own ontology models focusing upon software related terms and concepts. While many SE glossaries exist, they are often incomplete and tend to define the vocabulary for specific sub-fields without capturing associations between terms and phrases. This limits their usefulness for supporting NL-related tasks. In this paper we propose an approach for constructing and evolving a semantic network of software engineering concepts and phrases. Our approach starts with a set of existing SE glossaries, uses the existing glossary terms and explicitly defined associations as a starting point, uses machine learning-based techniques to dynamically identify and document additional associations between terms, leverages the network to interpret NL queries in the SE domain, and finally augments the resulting semantic network with feedback provided by users. We evaluate the viability of our approach within the sub-domain of Agile Software Development, focusing on requirements related queries, and show that the semantic network enhances the ability of an NL interface to correctly interpret and execute user queries.
Spike-based causal inference for weight alignment
Jordan Guerguiev
Konrad Paul Kording
In artificial neural networks trained with gradient descent, the weights used for processing stimuli are also used during backward passes to… (voir plus) calculate gradients. For the real brain to approximate gradients, gradient information would have to be propagated separately, such that one set of synaptic weights is used for processing and another set is used for backward passes. This produces the so-called "weight transport problem" for biological models of learning, where the backward weights used to calculate gradients need to mirror the forward weights used to process stimuli. This weight transport problem has been considered so hard that popular proposals for biological learning assume that the backward weights are simply random, as in the feedback alignment algorithm. However, such random weights do not appear to work well for large networks. Here we show how the discontinuity introduced in a spiking system can lead to a solution to this problem. The resulting algorithm is a special case of an estimator used for causal inference in econometrics, regression discontinuity design. We show empirically that this algorithm rapidly makes the backward weights approximate the forward weights. As the backward weights become correct, this improves learning performance over feedback alignment on tasks such as Fashion-MNIST, SVHN, CIFAR-10 and VOC. Our results demonstrate that a simple learning rule in a spiking network can allow neurons to produce the right backward connections and thus solve the weight transport problem.
Stochastic Hamiltonian Gradient Methods for Smooth Games
Nicolas Loizou
Hugo Berard
Alexia Jolicoeur-Martineau
The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the c… (voir plus)lass of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations.
A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry
Baihan Lin
Guillermo Cecchi
Djallel Bouneffouf
Jenna Reinen
Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework fo… (voir plus)r reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases -- an important component of human decision making which can help us better understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems, as well as various neuropsychiatric conditions associated with disruptions in normal reward processing. From the computational perspective, we observe that the proposed Split-QL model and its clinically inspired variants consistently outperform standard Q-Learning and SARSA methods, as well as recently proposed Double Q-Learning approaches, on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the Pac-Man game in a lifelong learning setting across different reward stationarities.