Publications

DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets

Jason Hartford

Leo J. Lee

Bo Wang

One of the grand challenges of cell biology is inferring the gene regulatory network (GRN) which describes interactions between genes and th… (see more)eir products that control gene expression and cellular function. We can treat this as a causal discovery problem but with two non-standard challenges: (1) regulatory networks are inherently cyclic so we should not model a GRN as a directed acyclic graph (DAG), and (2) observations have significant measurement noise, so for typical sample sizes there will always be a large equivalence class of graphs that are likely given the data, and we want methods that capture this uncertainty. Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both. In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges. Because we have access to velocity information, we can treat the Bayesian structure learning problem as a problem of sparse identification of a dynamical system, capturing cyclic feedback loops through time. Since our objective is to model uncertainty over discrete structures, we leverage Generative Flow Networks (GFlowNets) to estimate the posterior distribution over the combinatorial space of possible sparse dependencies. Our results indicate that our method learns posteriors that better encapsulate the distributions of cyclic structures compared to counterpart state-of-the-art Bayesian structure learning approaches.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Equivariant Adaptation of Large Pretrained Models

Arnab Kumar Mondal

Siba Smarak Panigrahi

Sékou-Oumar Kaba

Sai Rajeswar

Siamak Ravanbakhsh

Equivariant networks are specifically designed to ensure consistent behavior with respect to a set of input transformations, leading to high… (see more)er sample efficiency and more accurate and robust predictions. However, redesigning each component of prevalent deep neural network architectures to achieve chosen equivariance is a difficult problem and can result in a computationally expensive network during both training and inference. A recently proposed alternative towards equivariance that removes the architectural constraints is to use a simple canonicalization network that transforms the input to a canonical form before feeding it to an unconstrained prediction network. We show here that this approach can effectively be used to make a large pretrained network equivariant. However, we observe that the produced canonical orientations can be misaligned with those of the training distribution, hindering performance. Using dataset-dependent priors to inform the canonicalization function, we are able to make large pretrained models equivariant while maintaining their performance. This significantly improves the robustness of these models to deterministic transformations of the data, such as rotations. We believe this equivariant adaptation of large pretrained models can help their domain-specific applications with known symmetry priors.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

For SALE: State-Action Representation Learning for Deep Reinforcement Learning

Scott Fujimoto

Wei-Di Chang

Edward J. Smith

Shixiang Shane Gu

Doina Precup

David Meger

In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked… (see more) for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

GAUCHE: A Library for Gaussian Processes in Chemistry

Ryan-Rhys Griffiths

Leo Klarner

Henry Moss

Aditya Ravuri

Sang Truong

Bojana Rankovic

Samuel Stanton

Yuanqi Du

Arian Jamasb

Gary Tom

Julius Schwartz

Austin Tripp

Aryan Deshwal

Gregory Kell

Anthony Bourached

Alex J. Chan

Jacob Moss

Chengzhi Guo

Simon Frieder

Alpha A. Lee … (see 8 more)

Philippe Schwaller

Jian Tang

Johannes Durholt

Saudamini Chaurasia

Ji Won Park

Felix Strieth-Kalthoff

Bingqing Cheng

Alán Aspuru-Guzik

We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine… (see more) learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations, however, is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings and bit vectors. By defining such kernels in GAUCHE, we seek to open the door to powerful tools for uncertainty quantification and Bayesian optimisation in chemistry. Motivated by scenarios frequently encountered in experimental chemistry, we showcase applications for GAUCHE in molecular discovery and chemical reaction optimisation. The codebase is made available at https://github.com/leojklarner/gauche

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Group Robust Classification Without Any Group Information

Christos Tsirigotis

Joao Monteiro

Pau Rodríguez

David Vázquez

Aaron Courville

Empirical risk minimization (ERM) is sensitive to spurious correlations in the training data, which poses a significant risk when deploying … (see more)systems trained under this paradigm in high-stake applications. While the existing literature focuses on maximizing group-balanced or worst-group accuracy, estimating these accuracies is hindered by costly bias annotations. This study contends that current bias-unsupervised approaches to group robustness continue to rely on group information to achieve optimal performance. Firstly, these methods implicitly assume that all group combinations are represented during training. To illustrate this, we introduce a systematic generalization task on the MPI3D dataset and discover that current algorithms fail to improve the ERM baseline when combinations of observed attribute values are missing. Secondly, bias labels are still crucial for effective model selection, restricting the practicality of these methods in real-world scenarios. To address these limitations, we propose a revised methodology for training and validating debiased models in an entirely bias-unsupervised manner. We achieve this by employing pretrained self-supervised models to reliably extract bias information, which enables the integration of a logit adjustment training loss with our validation criterion. Our empirical analysis on synthetic and real-world tasks provides evidence that our approach overcomes the identified challenges and consistently enhances robust accuracy, attaining performance which is competitive with or outperforms that of state-of-the-art methods, which, conversely, rely on bias labels for validation.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Guiding The Last Layer in Federated Learning with Pre-Trained Models

Lucas Caccia

Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data. Rece… (see more)nt works have begun to consider the effects of using pre-trained models as an initialization point for existing FL algorithms; however, these approaches ignore the vast body of efficient transfer learning literature from the centralized learning setting. Here we revisit the problem of FL from a pre-trained model considered in prior work and expand it to a set of computer vision transfer learning problems. We first observe that simply fitting a linear classification head can be efficient and effective in many cases. We then show that in the FL setting, fitting a classifier using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals, while obtaining strong performance. Finally, we demonstrate that using a two-phase approach of obtaining the classifier and then fine-tuning the model can yield rapid convergence and improved generalization in the federated setting. We demonstrate the potential our method has to reduce communication and compute costs while achieving better model performance.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Importance-aware Co-teaching for Offline Model-based Optimization

Ye Yuan

Can Chen

Zixuan Liu

Willie Neiswanger

Xue Liu

2023-09-20

NeurIPS.cc/2023/Conference (poster)

openreview.net

Improving Compositional Generalization using Iterated Learning and Simplicial Embeddings

Yi Ren

Samuel Lavoie

Mikhail Galkin

Danica J. Sutherland

Aaron Courville

Compositional generalization, the ability of an agent to generalize to unseen combinations of latent factors, is easy for humans but hard fo… (see more)r deep neural networks. A line of research in cognitive science has hypothesized a process, ``iterated learning,'' to help explain how human language developed this ability; the theory rests on simultaneous pressures towards compressibility (when an ignorant agent learns from an informed one) and expressivity (when it uses the representation for downstream tasks). Inspired by this process, we propose to improve the compositional generalization of deep networks by using iterated learning on models with simplicial embeddings, which can approximately discretize representations. This approach is further motivated by an analysis of compositionality based on Kolmogorov complexity. We show that this combination of changes improves compositional generalization over other approaches, demonstrating these improvements both on vision tasks with well-understood latent factors and on real molecular graph prediction tasks where the latent structure is unknown.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Improving day-ahead Solar Irradiance Time Series Forecasting by Leveraging Spatio-Temporal Context

Ghait Boukachab

Solar power harbors immense potential in mitigating climate change by substantially reducing CO…

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Improving Language Plasticity via Pretraining with Active Forgetting

Yihong Chen

Kelly Marchisio

Roberta Raileanu

David Ifeoluwa Adelani

Pontus Stenetorp

Sebastian Riedel

Mikel Artetxe

Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performan… (see more)ce, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation, but also outperform standard ones in a low-data regime, particularly for languages that are distant from English. Code will be available at https://github.com/facebookresearch/language-model-plasticity.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network

Tristan Deleu

Mizu Nishikawa-Toomey

Jithendaraa Subramanian

Nikolay Malkin

Laurent Charlin

Yoshua Bengio

Generative Flow Networks (GFlowNets), a class of generative models over discrete and structured sample spaces, have been previously applied … (see more)to the problem of inferring the marginal posterior distribution over the directed acyclic graph (DAG) of a Bayesian Network, given a dataset of observations. Based on recent advances extending this framework to non-discrete sample spaces, we propose in this paper to approximate the joint posterior over not only the structure of a Bayesian Network, but also the parameters of its conditional probability distributions. We use a single GFlowNet whose sampling policy follows a two-phase process: the DAG is first generated sequentially one edge at a time, and then the corresponding parameters are picked once the full structure is known. Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models of the Bayesian Network, making our approach applicable even to non-linear models parametrized by neural networks. We show that our method, called JSP-GFN, offers an accurate approximation of the joint posterior, while comparing favorably against existing methods on both simulated and real data.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Joint Prompt Optimization of Stacked LLMs using Variational Inference

Alessandro Sordoni

Xingdi Yuan

Marc-Alexandre Côté

Matheus Pereira

Adam Trischler

Ziang Xiao

Arian Hosseini

Friederike Niedtner

Nicolas Le Roux

Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences. Thus, they can b… (see more)e seen as stochastic language layers in a language network, where the learnable parameters are the natural language prompts at each layer. By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). Then, we present an extension that applies to 2-layer DLNs (DLN-2), where two prompts must be learned. The key idea is to consider the output of the first layer as a latent variable, which requires inference, and prompts to be learned as the parameters of the generative distribution. We first test the effectiveness of DLN-1 in multiple reasoning and natural language understanding tasks. Then, we show that DLN-2 can reach higher performance than a single layer, showing promise that we might reach comparable performance to GPT-4, even when each LLM in the network is smaller and less powerful.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Publications

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Popular keywords:

Publications