Publications

The default mode network in cognition: a topographical perspective
Jonathan Smallwood
Boris C Bernhardt
Robert Leech
Elizabeth Jefferies
Daniel S. Margulies
Meeting and Missing Minds: Children and Adults Use Alignment of Intuitions to Solve Pure Coordination Games
Daniel Perez-Zapata
Xavia McKenzie-Smart
Ian Apperly
In pure coordination games players seek to coordinate responses with one another without communicating. Without a logically correct response… (voir plus), success depends upon players intuiting a response that is mutually obvious. Previous work suggests that such coordination requires a distinctive form of “group” thinking and sufficient mutual knowledge, but reveals little about the basis for the intuitive judgements themselves. Here, that question was addressed for the first time by examining the basis of coordination performance of groups whose intuitions might plausibly differ: children versus adults. Twenty-five 5-year-olds, 30 7-year-olds, and 25 adults undertook four types of coordination game, and novel metrics allowed “intuitive alignment” in responses to be evaluated within- and between-groups. All groups performed above chance, and adults showed higher levels of alignment than children, but adults and children showed different patterns in their intuitions. Implications for intergenerational understanding and mis-understanding are discussed.
Fixed-Points for Quantitative Equational Logics
Radu Mardare
Gordon Plotkin
We develop a fixed-point extension of quantitative equational logic and give semantics in one-bounded complete quantitative algebras. Unlike… (voir plus) previous related work about fixed-points in metric spaces, we are working with the notion of approximate equality rather than exact equality. The result is a novel theory of fixed points which can not only provide solutions to the traditional fixed-point equations but we can also define the rate of convergence to the fixed point. We show that such a theory is the quantitative analogue of a Conway theory and also of an iteration theory; and it reflects the metric coinduction principle. We study the Bellman equation for a Markov decision process as an illustrative example.
Universal Semantics for the Stochastic λ-Calculus
Pedro H. Azevedo de Amorim
Dexter Kozen
Radu Mardare
Michael Roberts
We define sound and adequate denotational and operational semantics for the stochastic lambda calculus. These two semantic approaches build … (voir plus)on previous work that used an explicit source of randomness to reason about higher-order probabilistic programs.
Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization
Valentin Thomas
Marlos C. Machado
Bandit and reinforcement learning (RL) problems can often be framed as optimization problems where the goal is to maximize average performan… (voir plus)ce while having access only to stochastic estimates of the true gradient. Traditionally, stochastic optimization theory predicts that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates. In this paper we demonstrate that this is not the case for bandit and RL problems. To allow our analysis to be interpreted in light of multi-step MDPs, we focus on techniques derived from stochastic optimization principles (e.g., natural policy gradient and EXP3) and we show that some standard assumptions from optimization theory are violated in these problems. We present theoretical results showing that, at least for bandit problems, curvature and noise are not sufficient to explain the learning dynamics and that seemingly innocuous choices like the baseline can determine whether an algorithm converges. These theoretical findings match our empirical evaluation, which we extend to multi-state MDPs.
A Brief Study on the Effects of Training Generative Dialogue Models with a Semantic loss
Neural models trained for next utterance generation in dialogue task learn to mimic the n-gram sequences in the training set with training o… (voir plus)bjectives like negative log-likelihood (NLL) or cross-entropy. Such commonly used training objectives do not foster generating alternate responses to a context. But, the effects of minimizing an alternate training objective that fosters a model to generate alternate response and score it on semantic similarity has not been well studied. We hypothesize that a language generation model can improve on its diversity by learning to generate alternate text during training and minimizing a semantic loss as an auxiliary objective. We explore this idea on two different sized data sets on the task of next utterance generation in goal oriented dialogues. We make two observations (1) minimizing a semantic objective improved diversity in responses in the smaller data set (Frames) but only as-good-as minimizing the NLL in the larger data set (MultiWoZ) (2) large language model embeddings can be more useful as a semantic loss objective than as initialization for token embeddings.
Continuous Coordination As a Realistic Scenario for Lifelong Learning
Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. L… (voir plus)ifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents' policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi -- a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works. The code and all pre-trained models are available at https://github.com/chandar-lab/Lifelong-Hanabi.
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation
Marginalized importance sampling (MIS), which measures the density ratio between the state-action occupancy of a target policy and that of a… (voir plus) sampling distribution, is a promising approach for off-policy evaluation. However, current state-of-the-art MIS methods rely on complex optimization tricks and succeed mostly on simple toy problems. We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy. The successor representation can be trained through deep reinforcement learning methodology and decouples the reward optimization from the dynamics of the environment, making the resulting algorithm stable and applicable to high-dimensional domains. We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.
Directional Graph Networks: Anisotropic Aggregation in Graph Neural Networks via Directional Vector Fields
Saro Passaro
William L. Hamilton
Gabriele Corso
Pietro Lio
The lack of anisotropic kernels in graph neural networks (GNNs) strongly limits their expressiveness, contributing to well-known issues such… (voir plus) as over-smoothing. To overcome this limitation, we propose the first globally consistent anisotropic kernels for GNNs, allowing for graph convolutions that are defined according to topologicaly-derived directional flows. First, by defining a vector field in the graph, we develop a method of applying directional derivatives and smoothing by projecting node-specific messages into the field. Then, we propose the use of the Laplacian eigenvectors as such vector field. We show that the method generalizes CNNs on an
Educating the future generation of researchers: A cross-disciplinary survey of trends in analysis methods
Taylor Bolt
Jason S. Nomi
Lucina Q. Uddin
Methods for data analysis in the biomedical, life, and social (BLS) sciences are developing at a rapid pace. At the same time, there is incr… (voir plus)easing concern that education in quantitative methods is failing to adequately prepare students for contemporary research. These trends have led to calls for educational reform to undergraduate and graduate quantitative research method curricula. We argue that such reform should be based on data-driven insights into within- and cross-disciplinary use of analytic methods. Our survey of peer-reviewed literature analyzed approximately 1.3 million openly available research articles to monitor the cross-disciplinary mentions of analytic methods in the past decade. We applied data-driven text mining analyses to the “Methods” and “Results” sections of a large subset of this corpus to identify trends in analytic method mentions shared across disciplines, as well as those unique to each discipline. We found that the t test, analysis of variance (ANOVA), linear regression, chi-squared test, and other classical statistical methods have been and remain the most mentioned analytic methods in biomedical, life science, and social science research articles. However, mentions of these methods have declined as a percentage of the published literature between 2009 and 2020. On the other hand, multivariate statistical and machine learning approaches, such as artificial neural networks (ANNs), have seen a significant increase in the total share of scientific publications. We also found unique groupings of analytic methods associated with each BLS science discipline, such as the use of structural equation modeling (SEM) in psychology, survival models in oncology, and manifold learning in ecology. We discuss the implications of these findings for education in statistics and research methods, as well as within- and cross-disciplinary collaboration.
Equivariant Networks for Pixelized Spheres
Pixelizations of Platonic solids such as the cube and icosahedron have been widely used to represent spherical data, from climate records to… (voir plus) Cosmic Microwave Background maps. Platonic solids have well-known global symmetries. Once we pixelize each face of the solid, each face also possesses its own local symmetries in the form of Euclidean isometries. One way to combine these symmetries is through a hierarchy. However, this approach does not adequately model the interplay between the two levels of symmetry transformations. We show how to model this interplay using ideas from group theory, identify the equivariant linear maps, and introduce equivariant padding that respects these symmetries. Deep networks that use these maps as their building blocks generalize gauge equivariant CNNs on pixelized spheres. These deep networks achieve state-of-the-art results on semantic segmentation for climate data and omnidirectional image processing. Code is available at https://git.io/JGiZA.
Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards
A major challenge in reinforcement learning is the design of exploration strategies, especially for environments with sparse reward structur… (voir plus)es and continuous state and action spaces. Intuitively, if the reinforcement signal is very scarce, the agent should rely on some form of short-term memory in order to cover its environment efficiently. We propose a new exploration method, based on two intuitions: (1) the choice of the next exploratory action should depend not only on the (Markovian) state of the environment, but also on the agent's trajectory so far, and (2) the agent should utilize a measure of spread in the state space to avoid getting stuck in a small region. Our method leverages concepts often used in statistical physics to provide explanations for the behavior of simplified (polymer) chains in order to generate persistent (locally self-avoiding) trajectories in state space. We discuss the theoretical properties of locally self-avoiding walks and their ability to provide a kind of short-term memory through a decaying temporal correlation within the trajectory. We provide empirical evaluations of our approach in a simulated 2D navigation task, as well as higher-dimensional MuJoCo continuous control locomotion tasks with sparse rewards.