Publications

Crystal-GFN: sampling materials with desirable properties and constraints
Mistal
Alex Hernandez-Garcia
Alexandra Volokhova
Alexandre AGM Duval
Divya Sharma
pierre luc carrier
Michał Koziarski
Victor Schmidt
DGFN: Double Generative Flow Networks
Elaine Lau
Nikhil Murali Vemgal
Emmanuel Bengio
Discrete, compositional, and symbolic representations through attractor dynamics
Andrew Nam
Eric Elmoznino
Nikolay Malkin
Chen Sun
Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite ca… (see more)pacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractor-supported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information.
Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search
Abbas Mehrabian
Ankit Anand
Hyunjik Kim
Nicolas Sonnerat
Matej Balog
Gheorghe Comanici
Tudor Berariu
Andrew Lee
Anian Ruoss
Anna Bulanova
Daniel Toyama
Sam Blackwell
Bernardino Romera Paredes
Petar Veličković
Laurent Orseau
Joonkyung Lee
Anurag Murty Naredla
Adam Zsolt Wagner
Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels
Thomas Jiralerspong
Flemming Kondrup
The ability to plan at many different levels of abstraction enables agents to envision the long-term repercussions of their decisions and th… (see more)us enables sample-efficient learning. This becomes particularly beneficial in complex environments from high-dimensional state space such as pixels, where the goal is distant and the reward sparse. We introduce Forecaster, a deep hierarchical reinforcement learning approach which plans over high-level goals leveraging a temporally abstract world model. Forecaster learns an abstract model of its environment by modelling the transitions dynamics at an abstract level and training a world model on such transition. It then uses this world model to choose optimal high-level goals through a tree-search planning procedure. It additionally trains a low-level policy that learns to reach those goals. Our method not only captures building world models with longer horizons, but also, planning with such models in downstream tasks. We empirically demonstrate Forecaster's potential in both single-task learning and generalization to new tasks in the AntMaze domain.
Improving Generalization in Reinforcement Learning Training Regimes for Social Robot Navigation
In order for autonomous mobile robots to navigate in human spaces, they must abide by our social norms. Reinforcement learning (RL) has emer… (see more)ged as an effective method to train robot sequential decision-making policies that are able to respect these norms. However, a large portion of existing work in the field conducts both RL training and testing in simplistic environments. This limits the generalization potential of these models to unseen environments, and undermines the meaningfulness of their reported results. We propose a method to improve the generalization performance of RL social navigation methods using curriculum learning. By employing multiple environment types and by modeling pedestrians using multiple dynamics models, we are able to progressively diversify and escalate difficulty in training. Our results show that the use of curriculum learning in training can be used to achieve better generalization performance than previous training methods. We also show that results presented in many existing state-of-the art RL social navigation works do not evaluate their methods outside of their training environments, and thus do not reflect their policies' failure to adequately generalize to out-of-distribution scenarios. In response, we validate our training approach on larger and more crowded testing environments than those used in training, allowing for more meaningful measurements of model performance.
Improving Intrinsic Exploration by Creating Stationary Objectives
Roger Creus Castanyer
Joshua Romoff
Learning Macro Variables with Auto-encoders
Eric Elmoznino
Maitreyi Swaroop
Learning Optimizers for Local SGD
Charles-Étienne Joseph
Benjamin Thérien
Abhinav Moudgil
Boris Knyazev
Learning to Scale Logits for Temperature-Conditional GFlowNets
Minsu Kim
Joohwan Ko
Dinghuai Zhang
Ling Pan
Taeyoung Yun
Woo Chang Kim
Jinkyoo Park
GFlowNets are probabilistic models that learn a stochastic policy that sequentially generates compositional structures, such as molecular gr… (see more)aphs. They are trained with the objective of sampling such objects with probability proportional to the object's reward. Among GFlowNets, the temperature-conditional GFlowNets represent a family of policies indexed by temperature, and each is associated with the correspondingly tempered reward function. The major benefit of temperature-conditional GFlowNets is the controllability of GFlowNets' exploration and exploitation through adjusting temperature. We propose a \textit{Learning to Scale Logits for temperature-conditional GFlowNets} (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.
Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Max Schwarzer
Jesse Farebrother
Joshua Greaves
Kevin Roccapriore
Ekin Dogus Cubuk
Rishabh Agarwal
Sergei Kalinin
Igor Mordatch
We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (see more)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Max Schwarzer
Jesse Farebrother
Joshua Greaves
Kevin Roccapriore
Ekin Dogus Cubuk
Rishabh Agarwal
Sergei Kalinin
Igor Mordatch
We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (see more)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.