Publications

Meta Attention Networks: Meta Learning Attention To Modulate Information Between Sparsely Interacting Recurrent Modules

Nan Rosemary Ke

Decomposing knowledge into interchangeable pieces promises a generalization advantage when, at some level of representation, the learner is … (see more)likely to be faced with situations requiring novel combinations of existing pieces of knowledge or computation. We hypothesize that such a decomposition of knowledge is particularly relevant for higher levels of representation as we see this at work in human cognition and natural language in the form of systematicity or systematic generalization. To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs, as well as its reward function are stationary and can be re-used across tasks and changes in distribution. As the learner is confronted with variations in experiences, the attention selects which modules should be adapted and the parameters of those selected modules are adapted fast, while the parameters of attention mechanisms are updated slowly as meta-parameters. We ﬁnd that both the meta-learning and the modular aspects of the proposed system greatly help achieve faster learning in experiments with reinforcement learning setup involving navigation in a partially observed grid world.

2019-12-31

(published)

www.semanticscholar.org

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

Yoshua Bengio

Tristan Deleu

Nasim Rahaman

Nan Rosemary Ke

Sébastien Lachapelle

Olexa Bilaniuk

Anirudh Goyal

Christopher Pal

We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional change… (see more)s, e.g. due to interventions, actions of agents and other sources of non-stationarities. We show that under this assumption, the correct causal structural choices lead to faster adaptation to modified distributions because the changes are concentrated in one or just a few mechanisms when the learned knowledge is modularized appropriately. This leads to sparse expected gradients and a lower effective number of degrees of freedom needing to be relearned while adapting to the change. It motivates using the speed of adaptation to a modified distribution as a meta-learning objective. We demonstrate how this can be used to determine the cause-effect relationship between two observed variables. The distributional changes do not need to correspond to standard interventions (clamping a variable), and the learner has no direct knowledge of these interventions. We show that causal structures can be parameterized via continuous variables and learned end-to-end. We then explore how these ideas could be used to also learn an encoder that would map low-level observed variables to unobserved causal variables leading to faster adaptation out-of-distribution, learning a representation space where one can satisfy the assumptions of independent mechanisms and of small and sparse changes in these mechanisms due to actions and non-stationarities.

2019-12-31

ICLR (published)

doi.org

openreview.net

Modeling Route Choice with Real-Time Information: Comparing the Recursive and Non-Recursive Approaches

Xinlian Yu

Tien Mai

Jing Ding-Mastera

Song Gao

Emma Frejinger

Transportation systems are inherently uncertain due to disruptions such as bad weather, incident and the randomness of traveler’s choices.… (see more) Real-time information allows travelers to adapt to actual traffic conditions and potentially mitigate the adverse effect of uncertainty. We study the routing policy choice problems in a stochastic time-dependent (STD) network. A routing policy is defined as a decision rule applied at the end of each link that maps the realized traffic condition to the decision on the link to take next. Two types of routing policy choice models are formulated with perfect online information (POI): recursive logit model and non-recursive logit model. In the non-recursive model, a choice set of routing policies between an origin-destination (OD) pair is generated, and a probabilistic choice is modeled at the origin, while the choice of the next link at each link is a deterministic execution of the chosen routing policy. In the recursive model, the probabilistic choice of the next link is modeled at each link, following the framework of dynamic discrete choice models. The difference between the two models results from the interplay of two sources of stochasticity, i.e., nature’s probability and choice probability. The two models are equivalent when either source of stochasticity is removed, that is, in a deterministic network (as shown in Fosgerau et al., 2013) or with deterministic choice. We use an illustrative example to explore the difference between the two models when both sources of stochasticity exist, and find that when a route has state-wise stochastic dominance over the other, the recursive model predicts more extreme choice probabilities. The relation can go either way when the two routes are non-dominated. We further compare the two models in terms of computational efficiency in estimation and prediction, and flexibility in systematic utility specification and modeling correlation.

2019-12-31

(published)

www.semanticscholar.org

Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Baihan Lin

Guillermo Cecchi

Djallel Bouneffouf

Jenna Reinen

Irina Rish

2019-12-31

HBAI@IJCAI (published)

doi.org

Myeloarchitecture gradients in the human insula: Histological underpinnings and association to intrinsic functional connectivity

Jessica Royer

Casey Paquola

Sara Larivière

Reinder Vos de Wael

Shahin Tavakol

Alexander J. Lowe

Oualid Benkarim

Alan C. Evans

Danilo Bzdok

Jonathan Smallwood

Birgit Frauscher

Boris C. Bernhardt

2019-12-31

NeuroImage (published)

doi.org

Natural Language Processing and Text Mining with Graph-Structured Representations

Bang Liu

2019-12-31

(published)

doi.org

N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting

Boris N. Oreshkin

Dmitri Carpov

Nicolas Chapados

Yoshua Bengio

We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based o… (see more)n backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy.

2019-12-31

ICLR (published)

doi.org

openreview.net

Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning

Massimo Caccia

Pau Rodríguez

Lucas Caccia

Alexandre Lacoste

Continual learning studies agents that learn from streams of tasks without forgetting previous ones while adapting to new ones. Two recent c… (see more)ontinual-learning scenarios have opened new avenues of research. In meta-continual learning, the model is pre-trained to minimize catastrophic forgetting of previous tasks. In continual-meta learning, the aim is to train agents for faster remembering of previous tasks through adaptation. In their original formulations, both methods have limitations. We stand on their shoulders to propose a more general scenario, OSAKA, where an agent must quickly solve new (out-of-distribution) tasks, while also requiring fast remembering. We show that current continual learning, meta-learning, meta-continual learning, and continual-meta learning techniques fail in this new scenario. We propose Continual-MAML, an online extension of the popular MAML algorithm as a strong baseline for this scenario. We empirically show that Continual-MAML is better suited to the new scenario than the aforementioned methodologies, as well as standard continual learning and meta-learning approaches.

2019-12-31

NeurIPS (published)

doi.org

arxiv.org

An operator view of policy gradient methods

Dibya Ghosh

Marlos C. Machado

Nicolas Roux

We cast policy gradient methods as the repeated application of two operators: a policy improvement operator …

2019-12-31

NeurIPS (published)

arxiv.org

PAST DSAA KEYNOTE SPEAKERS

Yoshua Bengio

An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. By utilizing fas… (see more)t matrix block-approximation techniques, we propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions, while being able to meaningfully model local information of the graph (e.g., degrees) as well as global information (e.g., clustering coefficient, assortativity, etc.) if desired. This allows one to efficiently generate random networks with similar properties as an observed network, and the models can be used for several downstream tasks such as link prediction. Our methods are scalable to sparse graphs consisting of millions of nodes. Empirical evaluation demonstrates competitiveness in terms of both speed and accuracy with state-of-the-art methods—which are typically based on embedding the graph into some lowdimensional space— for link prediction, showcasing the potential of a more direct and interpretable probablistic model for this task.

2019-12-31

(published)

www.semanticscholar.org

Practical Dynamic SC-Flip Polar Decoders: Algorithm and Implementation

Furkan Ercan

Thibaud Tonnellier

Nghia Doan

Warren J. Gross

SC-Flip (SCF) is a low-complexity polar code decoding algorithm with improved performance, and is an alternative to high-complexity (CRC)-ai… (see more)ded SC-List (CA-SCL) decoding. However, the performance improvement of SCF is limited since it can correct up to only one channel error (

2019-12-31

IEEE Transactions on Signal Processing (published)

doi.org

arxiv.org

Principal Neighbourhood Aggregation for Graph Nets

Gabriele Corso

Luca Cavalleri

Dominique Beaini

Pietro Lio

Petar Veličković

2019-12-31

Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (published)