Publications

Equivariant Adaptation of Large Pretrained Models
Arnab Kumar Mondal
Siba Smarak Panigrahi
Sékou-Oumar Kaba
Sai Rajeswar
Equivariant networks are specifically designed to ensure consistent behavior with respect to a set of input transformations, leading to high… (see more)er sample efficiency and more accurate and robust predictions. However, redesigning each component of prevalent deep neural network architectures to achieve chosen equivariance is a difficult problem and can result in a computationally expensive network during both training and inference. A recently proposed alternative towards equivariance that removes the architectural constraints is to use a simple canonicalization network that transforms the input to a canonical form before feeding it to an unconstrained prediction network. We show here that this approach can effectively be used to make a large pretrained network equivariant. However, we observe that the produced canonical orientations can be misaligned with those of the training distribution, hindering performance. Using dataset-dependent priors to inform the canonicalization function, we are able to make large pretrained models equivariant while maintaining their performance. This significantly improves the robustness of these models to deterministic transformations of the data, such as rotations. We believe this equivariant adaptation of large pretrained models can help their domain-specific applications with known symmetry priors.
For SALE: State-Action Representation Learning for Deep Reinforcement Learning
Scott Fujimoto
Wei-Di Chang
Edward J. Smith
Shixiang Shane Gu
In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked… (see more) for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.
GAUCHE: A Library for Gaussian Processes in Chemistry
Ryan-Rhys Griffiths
Leo Klarner
Henry Moss
Aditya Ravuri
Sang T. Truong
Yuanqi Du
Samuel Don Stanton
Gary Tom
Bojana Rankovic
Arian Rokkum Jamasb
Aryan Deshwal
Julius Schwartz
Austin Tripp
Gregory Kell
Simon Frieder
Anthony Bourached
Alex James Chan
Jacob Moss
Chengzhi Guo
Johannes P. Dürholt … (see 8 more)
Saudamini Chaurasia
Ji Won Park
Felix Strieth-Kalthoff
Alpha Lee
Bingqing Cheng
Alan Aspuru-Guzik
Philippe Schwaller
We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine… (see more) learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations however is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings and bit vectors. By defining such kernels in GAUCHE, we seek to open the door to powerful tools for uncertainty quantification and Bayesian optimisation in chemistry. Motivated by scenarios frequently encountered in experimental chemistry, we showcase applications for GAUCHE in molecular discovery and chemical reaction optimisation. The codebase is made available at https://github.com/leojklarner/gauche
Group Robust Classification Without Any Group Information
Christos Tsirigotis
Joao Monteiro
Pau Rodriguez
David Vazquez
Guiding The Last Layer in Federated Learning with Pre-Trained Models
Gwen Legate
Nicolas Bernier
Lucas Caccia
Edouard Oyallon
Importance-aware Co-teaching for Offline Model-based Optimization
Ye Yuan
Can Chen
Zixuan Liu
Willie Neiswanger
Offline model-based optimization aims to find a design that maximizes a property of interest using only an offline dataset, with application… (see more)s in robot, protein, and molecule design, among others. A prevalent approach is gradient ascent, where a proxy model is trained on the offline dataset and then used to optimize the design. This method suffers from an out-of-distribution issue, where the proxy is not accurate for unseen designs. To mitigate this issue, we explore using a pseudo-labeler to generate valuable data for fine-tuning the proxy. Specifically, we propose
Importance-aware Co-teaching for Offline Model-based Optimization
Ye Yuan
Can Chen
Zixuan Liu
Willie Neiswanger
Improving Compositional Generalization using Iterated Learning and Simplicial Embeddings
Yi Ren
Samuel Lavoie
Mikhail Galkin
Danica J. Sutherland
Improving *day-ahead* Solar Irradiance Time Series Forecasting by Leveraging Spatio-Temporal Context
Oussama Boussif
Ghait Boukachab
Dan Assouline
Stefano Massaroli
Tianle Yuan
Loubna Benabbou
Solar power harbors immense potential in mitigating climate change by substantially reducing CO…
Improving Language Plasticity via Pretraining with Active Forgetting
Yihong Chen
Kelly Marchisio
Roberta Raileanu
Pontus Stenetorp
Sebastian Riedel
Mikel Artetxe
Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performan… (see more)ce, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation, but also outperform standard ones in a low-data regime, particularly for languages that are distant from English. Code will be available at https://github.com/facebookresearch/language-model-plasticity.
Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network
Tristan Deleu
Mizu Nishikawa-Toomey
Jithendaraa Subramanian
Nikolay Malkin
Generative Flow Networks (GFlowNets), a class of generative models over discrete and structured sample spaces, have been previously applied … (see more)to the problem of inferring the marginal posterior distribution over the directed acyclic graph (DAG) of a Bayesian Network, given a dataset of observations. Based on recent advances extending this framework to non-discrete sample spaces, we propose in this paper to approximate the joint posterior over not only the structure of a Bayesian Network, but also the parameters of its conditional probability distributions. We use a single GFlowNet whose sampling policy follows a two-phase process: the DAG is first generated sequentially one edge at a time, and then the corresponding parameters are picked once the full structure is known. Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models of the Bayesian Network, making our approach applicable even to non-linear models parametrized by neural networks. We show that our method, called JSP-GFN, offers an accurate approximation of the joint posterior, while comparing favorably against existing methods on both simulated and real data.
Joint Prompt Optimization of Stacked LLMs using Variational Inference
Eric Yuan
Xingdi Yuan
Marc-Alexandre Côté
Matheus Pereira
Adam Trischler
Ziang Xiao
Arian Hosseini
Friederike Niedtner
Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences. Thus, they can b… (see more)e seen as stochastic language layers in a language network, where the learnable parameters are the natural language prompts at each layer. By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). Then, we present an extension that applies to 2-layer DLNs (DLN-2), where two prompts must be learned. The key idea is to consider the output of the first layer as a latent variable, which requires inference, and prompts to be learned as the parameters of the generative distribution. We first test the effectiveness of DLN-1 in multiple reasoning and natural language understanding tasks. Then, we show that DLN-2 can reach higher performance than a single layer, showing promise that we might reach comparable performance to GPT-4, even when each LLM in the network is smaller and less powerful.