Smita Krishnaswamy

Christine L. Chaffer

Identifying functionally important cell states and structure within a heterogeneous tumor remains a significant biological and computational… (see more) challenge. Moreover, current clustering or trajectory-based computational models are ill-equipped to address the notion that cancer cells reside along a phenotypic continuum. To address this, we present Archetypal Analysis network (AAnet), a neural network that learns key archetypal cell states within a phenotypic continuum of cell states in single-cell data. Applied to single-cell RNA sequencing data from pre-clinical models and a cohort of 34 clinical breast cancers, AAnet identifies archetypes that resolve distinct biological cell states and processes, including cell proliferation, hypoxia, metabolism and immune interactions. Notably, archetypes identified in primary tumors are recapitulated in matched liver, lung and lymph node metastases, demonstrating that a significant component of intratumoral heterogeneity is driven by cell intrinsic properties. Using spatial transcriptomics as orthogonal validation, AAnet-derived archetypes show discrete spatial organization within tumors, supporting their distinct archetypal biology. We further reveal that ligand:receptor cross-talk between cancer and adjacent stromal cells contributes to intra-archetypal biological mimicry. Finally, we use AAnet archetype identifiers to validate GLUT3 as a critical mediator of a hypoxic cell archetype harboring a cancer stem cell population, which we validate in human triple-negative breast cancer specimens. AAnet is a powerful tool to reveal functional cell states within complex samples from multimodal single-cell data.

2024-05-14

bioRxiv (preprint)

BLIS-Net: Classifying and Analyzing Signals on Graphs

Charles Xu

Laney Goldman

Valentina Guo

Benjamin Hollander-Bodie

Maedee Trank-Greene

Ian Adelstein

Edward De Brouwer

Rex Ying

Michael Perlmutter

Graph neural networks (GNNs) have emerged as a powerful tool for tasks such as node classification and graph classification. However, much l… (see more)ess work has been done on signal classification, where the data consists of many functions (referred to as signals) defined on the vertices of a single graph. These tasks require networks designed differently from those designed for traditional GNN tasks. Indeed, traditional GNNs rely on localized low-pass filters, and signals of interest may have intricate multi-frequency behavior and exhibit long range interactions. This motivates us to introduce the BLIS-Net (Bi-Lipschitz Scattering Net), a novel GNN that builds on the previously introduced geometric scattering transform. Our network is able to capture both local and global signal structure and is able to capture both low-frequency and high-frequency information. We make several crucial changes to the original geometric scattering architecture which we prove increase the ability of our network to capture information about the input signal and show that BLIS-Net achieves superior performance on both synthetic and real-world data sets based on traffic flow and fMRI data.

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

Directed Scattering for Knowledge Graph-Based Cellular Signaling Analysis

Aarthi Venkat

Joyce Chew

Ferran Cardoso Rodriguez

Christopher J. Tape

Michael Perlmutter

Directed graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical … (see more)reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and multiscale structure. However, the majority of methods performing node- and edge-level tasks in machine learning do not take these properties into account, and thus have not been leveraged effectively for scientific tasks such as cellular signaling network inference. We propose a new framework called Directed Scattering Autoencoder (DSAE) which uses a directed version of a geometric scattering transform, combined with the non-linear dimensionality reduction properties of an autoencoder and the geometric properties of the hyperbolic space to learn latent hierarchies. We show this method outperforms numerous others on tasks such as embedding directed graphs and learning cellular signaling networks.

2024-04-14

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)

Bayesian Spectral Graph Denoising with Smoothness Prior

Samuel Leone

Xingzhi Sun

Michael Perlmutter

Here we consider the problem of denoising features associated to complex data, modeled as signals on a graph, via a smoothness prior. This i… (see more)s motivated in part by settings such as single-cell RNA where the data is very high-dimensional, but its structure can be captured via an affinity graph. This allows us to utilize ideas from graph signal processing. In particular, we present algorithms for the cases where the signal is perturbed by Gaussian noise, dropout, and uniformly distributed noise. The signals are assumed to follow a prior distribution defined in the frequency domain which favors signals which are smooth across the edges of the graph. By pairing this prior distribution with our three models of noise generation, we propose Maximum A Posteriori (M.A.P.) estimates of the true signal in the presence of noisy data and provide algorithms for computing the M.A.P. Finally, we demonstrate the algorithms’ ability to effectively restore signals from white noise on image data and from severe dropout in single-cell RNA sequence data.

2024-03-13

Annual Conference on Information Sciences and Systems (published)

Abstract B049: Pancreatic beta cell stress pathways drive pancreatic ductal adenocarcinoma development in obesity

Cathy C. Garcia

Aarthi Venkat

Alex Tong

Sherry Agabiti

Lauren Lawres

Rebecca Cardone

Richard G. Kibbey

Mandar Deepak Muzumdar

2024-01-16

Cancer Research (published)

Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy

Danqi Liao

Chen Liu

Benjamin W Christensen

Alexander Tong

Guillaume Huguet

Maximilian Nickel

Ian Adelstein

Entropy and mutual information in neural networks provide rich information on the learning process, but they have proven difficult to comput… (see more)e reliably in high dimensions. Indeed, in noisy and high-dimensional data, traditional estimates in ambient dimensions approach a fixed entropy and are prohibitively hard to compute. To address these issues, we leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. Specifically, we define diffusion spectral entropy (DSE) in neural representations of a dataset as well as diffusion spectral mutual information (DSMI) between different variables representing data. First, we show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data that outperform classic Shannon entropy, nonparametric estimation, and mutual information neural estimation (MINE). We then study the evolution of representations in classification networks with supervised learning, self-supervision, or overfitting. We observe that (1) DSE of neural representations increases during training; (2) DSMI with the class label increases during generalizable learning but stays stagnant during overfitting; (3) DSMI with the input signal shows differing trends: on MNIST it increases, while on CIFAR-10 and STL-10 it decreases. Finally, we show that DSE can be used to guide better network initialization and that DSMI can be used to predict downstream classification accuracy across 962 models on ImageNet.

2024-01-01

CISS (published)

Learnable Filters for Geometric Scattering Modules

Alexander Tong

Frederik Wenkel

Dhananjay Bhaskar

Kincaid MacDonald

Jackson Grady

Michael Perlmutter

2024-01-01

IEEE Transactions on Signal Processing (published)

Low-Dimensional Embeddings of High-Dimensional Data: Algorithms and Applications (Dagstuhl Seminar 24122).

Dmitry Kobak

Fred Hamprecht

Gal Mishne

Sebastian Damrich

2024-01-01

Dagstuhl Reports (published)

Inferring dynamic regulatory interaction graphs from time series data with perturbations

Dhananjay Bhaskar

Daniel Sumner Magruder

Edward De Brouwer

Matheo Morales

Aarthi Venkat

Frederik Wenkel

2023-11-18

logconference.io/LOG/2023/Conference (poster)

Graph topological property recovery with heat and wave dynamics-based features on graphs

Dhananjay Bhaskar

Yanlei Zhang

Charles Xu

Xingzhi Sun

Oluwadamilola Fasina

Maximilian Nickel

Michael Perlmutter

2023-09-18

ArXiv (preprint)

Neural FIM for learning Fisher information metrics from point cloud data

Oluwadamilola Fasina

Guillaume Huguet

Alexander Tong

Yanlei Zhang

Maximilian Nickel

Ian Adelstein

Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the under… (see more)lying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifold model for the data. Neural FIM creates an extensible metric space from discrete point cloud data such that information from the metric can inform us of manifold characteristics such as volume and geodesics. We demonstrate Neural FIM’s utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells).

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

Graph Fourier MMD for Signals on Graphs

Samuel Leone

Aarthi Venkat

While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little at… (see more)tention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel distance between distributions and signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes the difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called gene localization score which helps select genes for cellular state space characterization.

2023-05-21

SampTA/2023/Conference (published)