Publications

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees
Sharan Vaswani
Amirreza Kazemi
Reza Babanezhad Harikandeh
Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient metho… (see more)d as the actor and value-based method as the critic. The critic is usually trained by minimizing the TD error, an objective that is potentially decorrelated with the true goal of achieving a high reward with the actor. We address this mismatch by designing a joint objective for training the actor and critic in a decision-aware fashion. We use the proposed objective to design a generic, AC algorithm that can easily handle any function approximation. We explicitly characterize the conditions under which the resulting algorithm guarantees monotonic policy improvement, regardless of the choice of the policy and critic parameterization. Instantiating the generic algorithm results in an actor that involves maximizing a sequence of surrogate functions (similar to TRPO, PPO) and a critic that involves minimizing a closely connected objective. Using simple bandit examples, we provably establish the benefit of the proposed critic objective over the standard squared error. Finally, we empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.
A Definition of Continual Reinforcement Learning
David Abel
Andre Barreto
Benjamin Van Roy
Hado van Hasselt
Satinder Singh
DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing
Yang Zhang
Zuobai Zhang
Bozitao Zhong
Sanchit Misra
Proteins play a critical role in carrying out biological functions, and their 3D structures are essential in determining their functions. A… (see more)ccurately predicting the conformation of protein side-chains given their backbones is important for applications in protein structure prediction, design and protein-protein interactions. Traditional methods are computationally intensive and have limited accuracy, while existing machine learning methods treat the problem as a regression task and overlook the restrictions imposed by the constant covalent bond lengths and angles. In this work, we present DiffPack, a torsional diffusion model that learns the joint distribution of side-chain torsional angles, the only degrees of freedom in side-chain packing, by diffusing and denoising on the torsional space. To avoid issues arising from simultaneous perturbation of all four torsional angles, we propose autoregressively generating the four torsional angles from
A Diffusion-Model of Joint Interactive Navigation
Matthew Niedoba
Jonathan Wilder Lavington
Yunpeng Liu
Vasileios Lioutas
Justice Sefas
Xiaoxuan Liang
Dylan Green
Setareh Dabiri
Berend Zwartsenberg
Adam Ścibior
Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of pr… (see more)erecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN - a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.
Double Gumbel Q-Learning.
David Yu-Tung Hui
On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes
Jia Lin Hau
Mohammad Ghavamzadeh
Marek Petrik
DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets
Lazar Atanackovic
Alexander Tong
Jason Hartford
Leo J Lee
Bo Wang
Equivariant Adaptation of Large Pretrained Models
Arnab Kumar Mondal
Siba Smarak Panigrahi
Sékou-Oumar Kaba
Sai Rajeswar
Equivariant networks are specifically designed to ensure consistent behavior with respect to a set of input transformations, leading to high… (see more)er sample efficiency and more accurate and robust predictions. However, redesigning each component of prevalent deep neural network architectures to achieve chosen equivariance is a difficult problem and can result in a computationally expensive network during both training and inference. A recently proposed alternative towards equivariance that removes the architectural constraints is to use a simple canonicalization network that transforms the input to a canonical form before feeding it to an unconstrained prediction network. We show here that this approach can effectively be used to make a large pretrained network equivariant. However, we observe that the produced canonical orientations can be misaligned with those of the training distribution, hindering performance. Using dataset-dependent priors to inform the canonicalization function, we are able to make large pretrained models equivariant while maintaining their performance. This significantly improves the robustness of these models to deterministic transformations of the data, such as rotations. We believe this equivariant adaptation of large pretrained models can help their domain-specific applications with known symmetry priors.
For SALE: State-Action Representation Learning for Deep Reinforcement Learning
Scott Fujimoto
Wei-Di Chang
Edward J. Smith
Shixiang Shane Gu
In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked… (see more) for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.
GAUCHE: A Library for Gaussian Processes in Chemistry
Ryan-Rhys Griffiths
Leo Klarner
Henry Moss
Aditya Ravuri
Sang T. Truong
Yuanqi Du
Samuel Don Stanton
Gary Tom
Bojana Rankovic
Arian Rokkum Jamasb
Aryan Deshwal
Julius Schwartz
Austin Tripp
Gregory Kell
Simon Frieder
Anthony Bourached
Alex James Chan
Jacob Moss
Chengzhi Guo
Johannes P. Dürholt … (see 8 more)
Saudamini Chaurasia
Ji Won Park
Felix Strieth-Kalthoff
Alpha Lee
Bingqing Cheng
Alan Aspuru-Guzik
Philippe Schwaller
We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine… (see more) learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations however is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings and bit vectors. By defining such kernels in GAUCHE, we seek to open the door to powerful tools for uncertainty quantification and Bayesian optimisation in chemistry. Motivated by scenarios frequently encountered in experimental chemistry, we showcase applications for GAUCHE in molecular discovery and chemical reaction optimisation. The codebase is made available at https://github.com/leojklarner/gauche
Group Robust Classification Without Any Group Information
Christos Tsirigotis
Joao Monteiro
Pau Rodriguez
David Vazquez
Guiding The Last Layer in Federated Learning with Pre-Trained Models
Gwen Legate
Nicolas Bernier
Lucas Caccia
Edouard Oyallon