Siddharth Viswanath

Rex Ying

2025-09-25

ArXiv (preprint)

www.ncbi.nlm.nih.gov

HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

Hiren Madhu

João Felipe Rocha

Tinglin Huang

Rex Ying

2025-09-25

ArXiv (preprint)

A Graph Laplacian Eigenvector-based Pre-training Method for Graph Neural Networks

Howard Dai

Nyambura Njenga

Hiren Madhu

Ryan Pellico

Ian Adelstein

The development of self-supervised graph pre-training methods is a crucial ingredient in recent efforts to design robust graph foundation mo… (see more)dels (GFMs). Structure-based pre-training methods are under-explored yet crucial for downstream applications which rely on underlying graph structure. In addition, pre-training traditional message passing GNNs to capture global and regional structure is often challenging due to the risk of oversmoothing as network depth increases. We address these gaps by proposing the Laplacian Eigenvector Learning Module (LELM), a novel pre-training module for graph neural networks (GNNs) based on predicting the low-frequency eigenvectors of the graph Laplacian. Moreover, LELM introduces a novel architecture that overcomes oversmoothing, allowing the GNN model to learn long-range interdependencies. Empirically, we show that models pre-trained via our framework outperform baseline models on downstream molecular property prediction tasks.

2025-09-02

ArXiv (preprint)

Learning Laplacian Eigenvectors: a Pre-training Method for Graph Neural Networks

Howard Dai

Nyambura Njenga

Benjamin Whitsett

Catherine Ma

Darwin Deng

Sara de 'Angel

Alexandre Van Tassel

Ryan Pellico

Ian Adelstein

2025-09-02

ArXiv (preprint)

SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics

Rahul Singh

Yanlei Zhang

J. Adam Noah

Joy Hirsch

Graph neural networks have been useful in machine learning on graph-structured data, particularly for node classification and some types of … (see more)graph classification tasks. However, they have had limited use in representing patterning of signals over graphs. Patterning of signals over graphs and in subgraphs carries important information in many domains including neuroscience. Neural signals are spatiotemporally patterned, high dimensional and difficult to decode. Graph signal processing and associated GCN models utilize the graph Fourier transform and are unable to efficiently represent spatially or spectrally localized signal patterning on graphs. Wavelet transforms have shown promise here, but offer non-canonical representations and cannot be tightly confined to subgraphs. Here we propose SlepNet, a novel GCN architecture that uses Slepian bases rather than graph Fourier harmonics. In SlepNet, the Slepian harmonics optimally concentrate signal energy on specifically relevant subgraphs that are automatically learned with a mask. Thus, they can produce canonical and highly resolved representations of neural activity, focusing energy of harmonics on areas of the brain which are activated. We evaluated SlepNet across three fMRI datasets, spanning cognitive and visual tasks, and two traffic dynamics datasets, comparing its performance against conventional GNNs and graph signal processing constructs. SlepNet outperforms the baselines in all datasets. Moreover, the extracted representations of signal patterns from SlepNet offers more resolution in distinguishing between similar patterns, and thus represent brain signaling transients as informative trajectories. Here we have shown that these extracted trajectory representations can be used for other downstream untrained tasks. Thus we establish that SlepNet is useful both for prediction and representation learning in spatiotemporal data.

2025-06-19

ArXiv (preprint)

SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics

Rahul Singh

Yanlei Zhang

J. Adam Noah

Joy Hirsch

Graph neural networks have been useful in machine learning on graph-structured data, particularly for node classification and some types of … (see more)graph classification tasks. However, they have had limited use in representing patterning of signals over graphs. Patterning of signals over graphs and in subgraphs carries important information in many domains including neuroscience. Neural signals are spatiotemporally patterned, high dimensional and difficult to decode. Graph signal processing and associated GCN models utilize the graph Fourier transform and are unable to efficiently represent spatially or spectrally localized signal patterning on graphs. Wavelet transforms have shown promise here, but offer non-canonical representations and cannot be tightly confined to subgraphs. Here we propose SlepNet, a novel GCN architecture that uses Slepian bases rather than graph Fourier harmonics. In SlepNet, the Slepian harmonics optimally concentrate signal energy on specifically relevant subgraphs that are automatically learned with a mask. Thus, they can produce canonical and highly resolved representations of neural activity, focusing energy of harmonics on areas of the brain which are activated. We evaluated SlepNet across three fMRI datasets, spanning cognitive and visual tasks, and two traffic dynamics datasets, comparing its performance against conventional GNNs and graph signal processing constructs. SlepNet outperforms the baselines in all datasets. Moreover, the extracted representations of signal patterns from SlepNet offers more resolution in distinguishing between similar patterns, and thus represent brain signaling transients as informative trajectories. Here we have shown that these extracted trajectory representations can be used for other downstream untrained tasks. Thus we establish that SlepNet is useful both for prediction and representation learning in spatiotemporal data.

2025-06-19

ArXiv (preprint)

HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

Hiren Madhu

João Felipe Rocha

Tinglin Huang

Rex Ying

Single-cell transcriptomics and proteomics have become a great source for data-driven insights into biology, enabling the use of advanced de… (see more)ep learning methods to understand cellular heterogeneity and gene expression at the single-cell level. With the advent of spatial-omics data, we have the promise of characterizing cells within their tissue context as it provides both spatial coordinates and intra-cellular transcriptional or protein counts. Proteomics offers a complementary view by directly measuring proteins, which are the primary effectors of cellular function and key therapeutic targets. However, existing models either ignore the spatial information or the complex genetic and proteomic programs within cells. Thus they cannot infer how cell internal regulation adapts to microenvironmental cues. Furthermore, these models often utilize fixed gene vocabularies, hindering their generalizability unseen genes. In this paper, we introduce HEIST, a hierarchical graph transformer foundation model for spatial transcriptomics and proteomics. HEIST models tissues as hierarchical graphs. The higher level graph is a spatial cell graph, and each cell in turn, is represented by its lower level gene co-expression network graph. HEIST achieves this by performing both intra-level and cross-level message passing to utilize the hierarchy in its embeddings and can thus generalize to novel datatypes including spatial proteomics without retraining. HEIST is pretrained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive and masked autoencoding objectives. Unsupervised analysis of HEIST embeddings reveals spatially informed subpopulations missed by prior models. Downstream evaluations demonstrate generalizability to proteomics data and state-of-the-art performance in clinical outcome prediction, cell type annotation, and gene imputation across multiple technologies.

2025-06-11

ArXiv (preprint)

HiPoNet: A Multi-View Simplicial Complex Network for High Dimensional Point-Cloud and Single-Cell Data

Hiren Madhu

Dhananjay Bhaskar

Jake Kovalic

David R. Johnson

Christopher Tape

Ian Adelstein

Rex Ying

Michael Perlmutter

In this paper, we propose HiPoNet, an end-to-end differentiable neural network for regression, classification, and representation learning o… (see more)n high-dimensional point clouds. Our work is motivated by single-cell data which can have very high-dimensionality --exceeding the capabilities of existing methods for point clouds which are mostly tailored for 3D data. Moreover, modern single-cell and spatial experiments now yield entire cohorts of datasets (i.e., one data set for every patient), necessitating models that can process large, high-dimensional point-clouds at scale. Most current approaches build a single nearest-neighbor graph, discarding important geometric and topological information. In contrast, HiPoNet models the point-cloud as a set of higher-order simplicial complexes, with each particular complex being created using a reweighting of features. This method thus generates multiple constructs corresponding to different views of high-dimensional data, which in biology offers the possibility of disentangling distinct cellular processes. It then employs simplicial wavelet transforms to extract multiscale features, capturing both local and global topology from each view. We show that geometric and topological information is preserved in this framework both theoretically and empirically. We showcase the utility of HiPoNet on point-cloud level tasks, involving classification and regression of entire point-clouds in data cohorts. Experimentally, we find that HiPoNet outperforms other point-cloud and graph-based models on single-cell data. We also apply HiPoNet to spatial transcriptomics datasets using spatial coordinates as one of the views. Overall, HiPoNet offers a robust and scalable solution for high-dimensional data analysis.

2025-02-11

ArXiv (preprint)

HiPoNet: A Topology-Preserving Multi-View Neural Network For High Dimensional Point Cloud and Single-Cell Data

Hiren Madhu

Dhananjay Bhaskar

Jake Kovalic

Dave Johnson

Rex Ying

Christopher Tape

Ian Adelstein

Michael Perlmutter

In this paper, we propose HiPoNet, an end-to-end differentiable neural network for regression, classification, and representation learning o… (see more)n high-dimensional point clouds. Single-cell data can have high dimensionality exceeding the capabilities of existing methods point cloud tailored for 3D data. Moreover, modern single-cell and spatial experiments now yield entire cohorts of datasets (i.e. one on every patient), necessitating models that can process large, high-dimensional point clouds at scale. Most current approaches build a single nearest-neighbor graph, discarding important geometric information. In contrast, HiPoNet forms higher-order simplicial complexes through learnable feature reweighting, generating multiple data views that disentangle distinct biological processes. It then employs simplicial wavelet transforms to extract multi-scale features - capturing both local and global topology. We empirically show that these components preserve topological information in the learned representations, and that HiPoNet significantly outperforms state-of-the-art point-cloud and graph-based models on single cell. We also show an application of HiPoNet on spatial transcriptomics datasets using spatial co-ordinates as one of the views. Overall, HiPoNet offers a robust and scalable solution for high-dimensional data analysis.

2025-02-11

ArXiv (preprint)

HiPoNet: A Topology-Preserving Multi-View Neural Network For High Dimensional Point Cloud and Single-Cell Data

Hiren Madhu

Dhananjay Bhaskar

Jake Kovalic

David R. Johnson

Rex Ying

Christopher Tape

Ian Adelstein

Michael Perlmutter

In this paper, we propose HiPoNet, an end-to-end differentiable neural network for regression, classification, and representation learning o… (see more)n high-dimensional point clouds. Single-cell data can have high dimensionality exceeding the capabilities of existing methods point cloud tailored for 3D data. Moreover, modern single-cell and spatial experiments now yield entire cohorts of datasets (i.e. one on every patient), necessitating models that can process large, high-dimensional point clouds at scale. Most current approaches build a single nearest-neighbor graph, discarding important geometric information. In contrast, HiPoNet forms higher-order simplicial complexes through learnable feature reweighting, generating multiple data views that disentangle distinct biological processes. It then employs simplicial wavelet transforms to extract multi-scale features - capturing both local and global topology. We empirically show that these components preserve topological information in the learned representations, and that HiPoNet significantly outperforms state-of-the-art point-cloud and graph-based models on single cell. We also show an application of HiPoNet on spatial transcriptomics datasets using spatial co-ordinates as one of the views. Overall, HiPoNet offers a robust and scalable solution for high-dimensional data analysis.

2025-02-11

ArXiv (preprint)

Exploring the Manifold of Neural Networks Using Diffusion Geometry

Elliott Abel

Peyton Crevasse

Yvan Grinspan

Selma Mazioud

Folu Ogundipe

Kristof Reimann

Ellie Schueler

Andrew J. Steindl

Ellen Zhang

Dhananjay Bhaskar

Yanlei Zhang

Tim G. J. Rudner

Ian Adelstein

Drawing motivation from the manifold hypothesis, which posits that most high-dimensional data lies on or near low-dimensional manifolds, we … (see more)apply manifold learning to the space of neural networks. We learn manifolds where datapoints are neural networks by introducing a distance between the hidden layer representations of the neural networks. These distances are then fed to the non-linear dimensionality reduction algorithm PHATE to create a manifold of neural networks. We characterize this manifold using features of the representation, including class separation, hierarchical cluster structure, spectral entropy, and topological structure. Our analysis reveals that high-performing networks cluster together in the manifold, displaying consistent embedding patterns across all these features. Finally, we demonstrate the utility of this approach for guiding hyperparameter optimization and neural architecture search by sampling from the manifold.

2024-11-19

ArXiv (preprint)

Exploring the Manifold of Neural Networks Using Diffusion Geometry

Elliott Abel

Peyton Crevasse

Yvan Grinspan

Selma Mazioud

Folu Ogundipe

Kristof Reimann

Ellie Schueler

Andrew J. Steindl

Ellen Zhang

Dhananjay Bhaskar

Yanlei Zhang

Tim G. J. Rudner

Ian Adelstein

Drawing motivation from the manifold hypothesis, which posits that most high-dimensional data lies on or near low-dimensional manifolds, we … (see more)apply manifold learning to the space of neural networks. We learn manifolds where datapoints are neural networks by introducing a distance between the hidden layer representations of the neural networks. These distances are then fed to the non-linear dimensionality reduction algorithm PHATE to create a manifold of neural networks. We characterize this manifold using features of the representation, including class separation, hierarchical cluster structure, spectral entropy, and topological structure. Our analysis reveals that high-performing networks cluster together in the manifold, displaying consistent embedding patterns across all these features. Finally, we demonstrate the utility of this approach for guiding hyperparameter optimization and neural architecture search by sampling from the manifold.

2024-11-19

ArXiv (preprint)