Publications

Memory Augmented Optimizers for Deep Learning
Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradi… (voir plus)ents for efficient parameter updates. The aggregated history of gradients nudges the parameter updates in the right direction even when the gradients at any given step are not informative. Although the history of gradients summarized in meta-parameters or explicitly stored in memory has been shown effective in theory and practice, the question of whether
MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling
Ethan Manilow
Yi Deng
Rigel Swavely
Cheng-Zhi Anna Huang
Jesse Engel
Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detai… (voir plus)led expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control. Starting from interpretable Differentiable Digital Signal Processing (DDSP) synthesis parameters, we infer musical notes and high-level properties of their expressive performance (such as timbre, vibrato, dynamics, and articulation). This creates a 3-level hierarchy (notes, performance, synthesis) that affords individuals the option to intervene at each level, or utilize trained priors (performance given notes, synthesis given performance) for creative assistance. Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence. By utilizing an interpretable hierarchy, with multiple levels of granularity, MIDI-DDSP opens the door to assistive tools to empower individuals across a diverse range of musical experience.
New Insights on Reducing Abrupt Representation Change in Online Continual Learning
In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. E… (voir plus)xperience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones. We shed new light on this question by showing that applying ER causes the newly added classes' representations to overlap significantly with the previous classes, leading to highly disruptive parameter updates. Based on this empirical analysis, we propose a new method which mitigates this issue by shielding the learned representations from drastic adaptation to accommodate new classes. We show that using an asymmetric update rule pushes new classes to adapt to the older ones (rather than the reverse), which is more effective especially at task boundaries, where much of the forgetting typically occurs. Empirical results show significant gains over strong baselines on standard continual learning benchmarks
Policy Gradients Incorporating the Future
David Venuto
Ofir Nachum
Reasoning about the future -- understanding how decisions in the present time affect outcomes in the future -- is one of the central challen… (voir plus)ges for reinforcement learning (RL), especially in highly-stochastic or partially observable environments. While predicting the future directly is hard, in this work we introduce a method that allows an agent to "look into the future" without explicitly predicting it. Namely, we propose to allow an agent, during its training on past experience, to observe what \emph{actually} happened in the future at that time, while enforcing an information bottleneck to avoid the agent overly relying on this privileged information. This gives our agent the opportunity to utilize rich and useful information about the future trajectory dynamics in addition to the present. Our method, Policy Gradients Incorporating the Future (PGIF), is easy to implement and versatile, being applicable to virtually any policy gradient algorithm. We apply our proposed method to a number of off-the-shelf RL algorithms and show that PGIF is able to achieve higher reward faster in a variety of online and offline RL domains, as well as sparse-reward and partially observable environments.
Pre-training Molecular Graph Representation with 3D Geometry
Hanchen Wang
Weiyang Liu
Joan Lasenby
Hongyu Guo
Molecular graph representation learning is a fundamental problem in modern drug and material discovery. Molecular graphs are typically model… (voir plus)ed by their 2D topological structures, but it has been recently discovered that 3D geometric information plays a more vital role in predicting molecular functionalities. However, the lack of 3D information in real-world scenarios has significantly impeded the learning of geometric graph representation. To cope with this challenge, we propose the Graph Multi-View Pre-training (GraphMVP) framework where self-supervised learning (SSL) is performed by leveraging the correspondence and consistency between 2D topological structures and 3D geometric views. GraphMVP effectively learns a 2D molecular graph encoder that is enhanced by richer and more discriminative 3D geometry. We further provide theoretical insights to justify the effectiveness of GraphMVP. Finally, comprehensive experiments show that GraphMVP can consistently outperform existing graph SSL methods. Code is available on GitHub: https://github.com/chao1224/GraphMVP.
Properties from Mechanisms: An Equivariance Perspective on Identifiable Representation Learning
Jason Hartford
A key goal of unsupervised representation learning is "inverting" a data generating process to recover its latent properties. Existing work … (voir plus)that provably achieves this goal relies on strong assumptions on relationships between the latent variables (e.g., independence conditional on auxiliary information). In this paper, we take a very different perspective on the problem and ask, "Can we instead identify latent properties by leveraging knowledge of the mechanisms that govern their evolution?" We provide a complete characterization of the sources of non-identifiability as we vary knowledge about a set of possible mechanisms. In particular, we prove that if we know the exact mechanisms under which the latent properties evolve, then identification can be achieved up to any equivariances that are shared by the underlying mechanisms. We generalize this characterization to settings where we only know some hypothesis class over possible mechanisms, as well as settings where the mechanisms are stochastic. We demonstrate the power of this mechanism-based perspective by showing that we can leverage our results to generalize existing identifiable representation learning results. These results suggest that by exploiting inductive biases on mechanisms, it is possible to design a range of new identifiable representation learning approaches.
R5: Rule Discovery with Reinforced and Recurrent Relational Reasoning
Shengyao Lu
Keith G Mills
SHANGLING JUI
Di Niu
Systematicity, i.e., the ability to recombine known parts and rules to form new sequences while reasoning over relational data, is critical … (voir plus)to machine intelligence. A model with strong systematicity is able to train on small-scale tasks and generalize to large-scale tasks. In this paper, we propose R5, a relational reasoning framework based on reinforcement learning that reasons over relational graph data and explicitly mines underlying compositional logical rules from observations. R5 has strong systematicity and being robust to noisy data. It consists of a policy value network equipped with Monte Carlo Tree Search to perform recurrent relational prediction and a backtrack rewriting mechanism for rule mining. By alternately applying the two components, R5 progressively learns a set of explicit rules from data and performs explainable and generalizable relation prediction. We conduct extensive evaluations on multiple datasets. Experimental results show that R5 outperforms various embedding-based and rule induction baselines on relation prediction tasks while achieving a high recall rate in discovering ground truth rules.
Boosting Exploration in Multi-Task Reinforcement Learning using Adversarial Networks
Lacking social support is associated with structural divergences in hippocampus-default network co-variation patterns
Chris Zajner
R. Nathan Spreng
Elaborate social interaction is a pivotal asset of the human species. The complexity of people’s social lives may constitute the dominatin… (voir plus)g factor in the vibrancy of many individuals’ environment. The neural substrates linked to social cognition thus appear especially susceptible when people endure periods of social isolation: here, we zoom in on the systematic inter-relationships between two such neural substrates, the allocortical hippocampus (HC) and the neocortical default network (DN). Previous human social neuroscience studies have focused on the DN, while HC subfields have been studied in most detail in rodents and monkeys. To bring into contact these two separate research streams, we directly quantified how DN subregions are coherently co-expressed with specific HC subfields in the context of social isolation. A two-pronged decomposition of structural brain scans from ∼40 000 UK Biobank participants linked lack of social support to mostly lateral subregions in the DN patterns. This lateral DN association co-occurred with HC patterns that implicated especially subiculum, presubiculum, CA2, CA3 and dentate gyrus. Overall, the subregion divergences within spatially overlapping signatures of HC–DN co-variation followed a clear segregation into the left and right brain hemispheres. Separable regimes of structural HC–DN co-variation also showed distinct associations with the genetic predisposition for lacking social support at the population level.
Multilevel development of cognitive abilities in an artificial neural network
Konstantin Volzhenin
Jean-Pierre Changeux
Multiple biological mechanisms support the unique ability of the brain to develop complex cognitive abilities. Nevertheless, it remains uncl… (voir plus)ear which mechanisms are necessary and sufficient. We propose a neurocomputational model of the developing brain spanning sensorimotor, cognitive, and conscious levels. The model solves three tasks of increasing complexity: from visual recognition to cognitive manipulation and maintenance of conscious percepts. Results highlight two fundamental mechanisms for the multilevel development of cognitive abilities in biological neural networks: 1) synaptic epigenesis, with Hebbian learning at the local scale and reinforcement learning at the global scale; and 2) self-organized dynamics, through spontaneous activity and balanced excitatory/inhibitory ratio of neurons. We emphasize how these core features of human intelligence could guide future development in artificial intelligence.
The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning
Neural correlates of local parallelism during naturalistic vision
John Wilder
Morteza Rezanejad
Sven Dickinson
Allan Jepson
Dirk B. Walther
Human observers can rapidly perceive complex real-world scenes. Grouping visual elements into meaningful units is an integral part of this p… (voir plus)rocess. Yet, so far, the neural underpinnings of perceptual grouping have only been studied with simple lab stimuli. We here uncover the neural mechanisms of one important perceptual grouping cue, local parallelism. Using a new, image-computable algorithm for detecting local symmetry in line drawings and photographs, we manipulated the local parallelism content of real-world scenes. We decoded scene categories from patterns of brain activity obtained via functional magnetic resonance imaging (fMRI) in 38 human observers while they viewed the manipulated scenes. Decoding was significantly more accurate for scenes containing strong local parallelism compared to weak local parallelism in the parahippocampal place area (PPA), indicating a central role of parallelism in scene perception. To investigate the origin of the parallelism signal we performed a model-based fMRI analysis of the public BOLD5000 dataset, looking for voxels whose activation time course matches that of the locally parallel content of the 4916 photographs viewed by the participants in the experiment. We found a strong relationship with average local symmetry in visual areas V1-4, PPA, and retrosplenial cortex (RSC). Notably, the parallelism-related signal peaked first in V4, suggesting V4 as the site for extracting paralleism from the visual input. We conclude that local parallelism is a perceptual grouping cue that influences neuronal activity throughout the visual hierarchy, presumably starting at V4. Parallelism plays a key role in the representation of scene categories in PPA.