Publications

2019-12-31

ICLR (publié)

openreview.net

Measuring Systematic Generalization in Neural Proof Generation with Transformers

Nicolas Gontier

Koustuv Sinha

Siva Reddy

Christopher Pal

We are interested in understanding how well Transformer language models (TLMs) can perform reasoning tasks when trained on knowledge encoded… (voir plus) in the form of natural language. We investigate their systematic generalization abilities on a logical reasoning task in natural language, which involves reasoning over relationships between entities grounded in first-order logical proofs. Specifically, we perform soft theorem-proving by leveraging TLMs to generate natural language proofs. We test the generated proofs for logical consistency, along with the accuracy of the final inference. We observe length-generalization issues when evaluated on longer-than-trained sequences. However, we observe TLMs improve their generalization performance after being exposed to longer, exhaustive proofs. In addition, we discover that TLMs are able to generalize better using backward-chaining proofs compared to their forward-chaining counterparts, while they find it easier to generate forward chaining proofs. We observe that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs. This suggests that Transformers have efficient internal reasoning strategies that are harder to interpret. These results highlight the systematic generalization behavior of TLMs in the context of logical reasoning, and we believe this work motivates deeper inspection of their underlying reasoning strategies.

2019-12-31

NeurIPS (publié)

MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining

Zhi Wen

Xing Han Lu

Siva Reddy

2019-12-31

arXiv (prépublication)

Medical Imaging with Deep Learning: MIDL 2020 -- Short Paper Track

Tal Arbel

Ismail Ben Ayed

Marleen de Bruijne

Maxime Descoteaux

Hervé Lombaert

Chris Pal

This compendium gathers all the accepted extended abstracts from the Third International Conference on Medical Imaging with Deep Learning (M… (voir plus)IDL 2020), held in Montreal, Canada, 6-9 July 2020. Note that only accepted extended abstracts are listed here, the Proceedings of the MIDL 2020 Full Paper Track are published in the Proceedings of Machine Learning Research (PMLR).

2019-12-31

arXiv (prépublication)

Meta Attention Networks: Meta Learning Attention To Modulate Information Between Sparsely Interacting Recurrent Modules

Kanika Madan

Nan Rosemary Ke

Anirudh Goyal

Decomposing knowledge into interchangeable pieces promises a generalization advantage when, at some level of representation, the learner is … (voir plus)likely to be faced with situations requiring novel combinations of existing pieces of knowledge or computation. We hypothesize that such a decomposition of knowledge is particularly relevant for higher levels of representation as we see this at work in human cognition and natural language in the form of systematicity or systematic generalization. To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs, as well as its reward function are stationary and can be re-used across tasks and changes in distribution. As the learner is confronted with variations in experiences, the attention selects which modules should be adapted and the parameters of those selected modules are adapted fast, while the parameters of attention mechanisms are updated slowly as meta-parameters. We ﬁnd that both the meta-learning and the modular aspects of the proposed system greatly help achieve faster learning in experiments with reinforcement learning setup involving navigation in a partially observed grid world.

2019-12-31

(publié)

www.semanticscholar.org

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

Tristan Deleu

Nasim Rahaman

Nan Rosemary Ke

Sébastien Lachapelle

Olexa Bilaniuk

Anirudh Goyal

Christopher Pal

We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional change… (voir plus)s, e.g. due to interventions, actions of agents and other sources of non-stationarities. We show that under this assumption, the correct causal structural choices lead to faster adaptation to modified distributions because the changes are concentrated in one or just a few mechanisms when the learned knowledge is modularized appropriately. This leads to sparse expected gradients and a lower effective number of degrees of freedom needing to be relearned while adapting to the change. It motivates using the speed of adaptation to a modified distribution as a meta-learning objective. We demonstrate how this can be used to determine the cause-effect relationship between two observed variables. The distributional changes do not need to correspond to standard interventions (clamping a variable), and the learner has no direct knowledge of these interventions. We show that causal structures can be parameterized via continuous variables and learned end-to-end. We then explore how these ideas could be used to also learn an encoder that would map low-level observed variables to unobserved causal variables leading to faster adaptation out-of-distribution, learning a representation space where one can satisfy the assumptions of independent mechanisms and of small and sparse changes in these mechanisms due to actions and non-stationarities.

2019-12-31

ICLR (publié)

openreview.net

Modeling Route Choice with Real-Time Information: Comparing the Recursive and Non-Recursive Approaches

Xinlian Yu

Tien Mai

Jing Ding-Mastera

Song Gao

Emma Frejinger

Transportation systems are inherently uncertain due to disruptions such as bad weather, incident and the randomness of traveler’s choices.… (voir plus) Real-time information allows travelers to adapt to actual traffic conditions and potentially mitigate the adverse effect of uncertainty. We study the routing policy choice problems in a stochastic time-dependent (STD) network. A routing policy is defined as a decision rule applied at the end of each link that maps the realized traffic condition to the decision on the link to take next. Two types of routing policy choice models are formulated with perfect online information (POI): recursive logit model and non-recursive logit model. In the non-recursive model, a choice set of routing policies between an origin-destination (OD) pair is generated, and a probabilistic choice is modeled at the origin, while the choice of the next link at each link is a deterministic execution of the chosen routing policy. In the recursive model, the probabilistic choice of the next link is modeled at each link, following the framework of dynamic discrete choice models. The difference between the two models results from the interplay of two sources of stochasticity, i.e., nature’s probability and choice probability. The two models are equivalent when either source of stochasticity is removed, that is, in a deterministic network (as shown in Fosgerau et al., 2013) or with deterministic choice. We use an illustrative example to explore the difference between the two models when both sources of stochasticity exist, and find that when a route has state-wise stochastic dominance over the other, the recursive model predicts more extreme choice probabilities. The relation can go either way when the two routes are non-dominated. We further compare the two models in terms of computational efficiency in estimation and prediction, and flexibility in systematic utility specification and modeling correlation.

2019-12-31

(publié)

www.semanticscholar.org

Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Baihan Lin

Guillermo Cecchi

Djallel Bouneffouf

Jenna Reinen

Irina Rish

2019-12-31

HBAI@IJCAI (publié)

Myeloarchitecture gradients in the human insula: Histological underpinnings and association to intrinsic functional connectivity

Jessica Royer

Casey Paquola

Sara Larivière

Reinder Vos de Wael

Shahin Tavakol

Alexander J. Lowe

Oualid Benkarim

Alan C. Evans

Danilo Bzdok

Jonathan Smallwood

Birgit Frauscher

Boris C. Bernhardt

2019-12-31

NeuroImage (publié)

Natural Language Processing and Text Mining with Graph-Structured Representations

Bang Liu

2019-12-31

(publié)

N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting

Boris N. Oreshkin

Dmitri Carpov

Nicolas Chapados

We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based o… (voir plus)n backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy.

2019-12-31

ICLR (publié)

openreview.net

Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning

Massimo Caccia

Pau Rodríguez

Lucas Caccia

Alexandre Lacoste

Continual learning studies agents that learn from streams of tasks without forgetting previous ones while adapting to new ones. Two recent c… (voir plus)ontinual-learning scenarios have opened new avenues of research. In meta-continual learning, the model is pre-trained to minimize catastrophic forgetting of previous tasks. In continual-meta learning, the aim is to train agents for faster remembering of previous tasks through adaptation. In their original formulations, both methods have limitations. We stand on their shoulders to propose a more general scenario, OSAKA, where an agent must quickly solve new (out-of-distribution) tasks, while also requiring fast remembering. We show that current continual learning, meta-learning, meta-continual learning, and continual-meta learning techniques fail in this new scenario. We propose Continual-MAML, an online extension of the popular MAML algorithm as a strong baseline for this scenario. We empirically show that Continual-MAML is better suited to the new scenario than the aforementioned methodologies, as well as standard continual learning and meta-learning approaches.

2019-12-31

NeurIPS (publié)