Portrait of Alexander Tong is unavailable

Alexander Tong

Alumni

Publications

DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets
Jason Hartford
Leo J Lee
Bo Wang
One of the grand challenges of cell biology is inferring the gene regulatory network (GRN) which describes interactions between genes and th… (see more)eir products that control gene expression and cellular function. We can treat this as a causal discovery problem but with two non-standard challenges: (1) regulatory networks are inherently cyclic so we should not model a GRN as a directed acyclic graph (DAG), and (2) observations have significant measurement noise, so for typical sample sizes there will always be a large equivalence class of graphs that are likely given the data, and we want methods that capture this uncertainty. Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both. In this paper we leverage the fact that it is possible to estimate the"velocity"of gene expression with RNA velocity techniques to develop an approach that addresses both challenges. Because we have access to velocity information, we can treat the Bayesian structure learning problem as a problem of sparse identification of a dynamical system, capturing cyclic feedback loops through time. Since our objective is to model uncertainty over discrete structures, we leverage Generative Flow Networks (GFlowNets) to estimate the posterior distribution over the combinatorial space of possible sparse dependencies. Our results indicate that our method learns posteriors that better encapsulate the distributions of cyclic structures compared to counterpart state-of-the-art Bayesian structure learning approaches.
Graph Fourier MMD for signals on data graphs
While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little at… (see more)tention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel a distance between distributions, or non-negative signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called {\em gene localization score} which helps select genes for cellular state space characterization.
Improving and generalizing flow-based generative models with minibatch optimal transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have thus far been held back by limitations i… (see more)n their simulation-based maximum likelihood training. In this paper, we introduce a new technique called conditional flow matching (CFM), a simulation-free training objective for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, our CFM objec-tive does not require the source distribution to be Gaussian or require evaluation of its density. Based on this new objective, we also introduce optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks such as inferring single cell dynamics, unsupervised image translation, and Schr ¨ odinger bridge inference. Code is available at https://github.com/atong01/ conditional-flow-matching .
Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport
DynGFN: Bayesian Dynamic Causal Discovery using Generative Flow Networks
Jason Hartford
Leo Jingyu Lee
Bo Wang
Learning the causal structure of observable variables is a central focus for scientific discovery. Bayesian causal discovery methods tackle… (see more) this problem by learning a posterior over the set of admissible graphs given our priors and observations. Existing methods primarily consider observations from static systems and assume the underlying causal structure takes the form of a directed acyclic graph (DAG). In settings with dynamic feedback mechanisms that regulate the trajectories of individual variables, this acyclicity assumption fails unless we account for time. We focus on learning Bayesian posteriors over cyclic graphs and treat causal discovery as a problem of sparse identification of a dynamical sys-tem. This imposes a natural temporal causal order between variables and captures cyclic feedback loops through time. Under this lens, we propose a new framework for Bayesian causal discovery for dynamical systems and present a novel generative flow network architecture (DynGFN) tailored for this task. Our results indicate that DynGFN learns posteriors that better encapsulate the distributions over admissible cyclic causal structures compared to counterpart state-of-the-art approaches.
GEODESIC SINKHORN FOR FAST AND ACCURATE OPTIMAL TRANSPORT ON MANIFOLDS
María Ramos Zapatero
Christopher J. Tape
Efficient computation of optimal transport distance between distributions is of growing importance in data science. Sinkhorn-based methods a… (see more)re currently the state-of-the-art for such computations, but require O(n2) computations. In addition, Sinkhorn-based methods commonly use an Euclidean ground distance between datapoints. However, with the prevalence of manifold structured scientific data, it is often desirable to consider geodesic ground distance. Here, we tackle both issues by proposing Geodesic Sinkhorn—based on diffusing a heat kernel on a manifold graph. Notably, Geodesic Sinkhorn requires only O(n log n) computation, as we approximate the heat kernel with Chebyshev polynomials based on the sparse graph Laplacian. We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy. In particular, we define the barycentric distance as the distance between two such barycenters. Using this definition, we identify an optimal transport distance and path associated with the effect of treatment on cellular data.
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Edward De Brouwer
Yanlei Zhang
Ian Adelstein
Bayesian Dynamic Causal Discovery
Learning the causal structure of observable variables is a central focus for scientific discovery. Bayesian causal discovery methods tackle … (see more)this problem by learning a posterior over the set of admissible graphs that are equally likely given our priors and observations. Existing methods primarily consider observations from static systems and assume the underlying causal structure takes the form of a directed acyclic graph (DAG). In settings with dynamic feedback mechanisms that regulate the trajectories of individual variables, this acyclicity assumption fails unless we account for time. We treat causal discovery in the unrolled causal graph as a problem of sparse identification of a dynamical system. This imposes a natural temporal causal order between variables and captures cyclic feedback loops through time. Under this lens, we propose a new framework for Bayesian causal discovery for dynamical systems and present a novel generative flow network architecture (Dyn-GFN) tailored for this task. Dyn-GFN imposes an edge-wise sparse prior to sequentially build a k -sparse causal graph. Through evaluation on temporal data, our results show that the posterior learned with Dyn-GFN yields improved Bayes coverage of admissible causal structures relative to state of the art Bayesian causal discovery methods.