Smita Krishnaswamy

John A. Lee

Boudewijn P. F. Lelieveldt

Leland McInnes

Ian T. Nabney

Maximilian Noichl

Pavlin G. Poličar

Bastian Rieck

Guy Wolf

Gal Mishne … (voir 1 de plus)

Dmitry Kobak

Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (voir plus)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.

2025-08-01

arXiv (publié)

STAGED: A Multi-Agent Neural Network for Learning Cellular Interaction Dynamics

João Felipe Rocha

Ke Xu

Xingzhi Sun

Ananya Krishna

Dhananjay Bhaskar

Blanche Mongeon

Morgan Craig

Mark B. Gerstein

The advent of single-cell technology has significantly improved our understanding of cellular states and subpopulations in various tissues u… (voir plus)nder normal and diseased conditions by employing data-driven approaches such as clustering and trajectory inference. However, these methods consider cells as independent data points of population distributions. With spatial transcriptomics, we can represent cellular organization, along with dynamic cell-cell interactions that lead to changes in cell state. Still, key computational advances are necessary to enable the data-driven learning of such complex interactive cellular dynamics. While agent-based modeling (ABM) provides a powerful framework, traditional approaches rely on handcrafted rules derived from domain knowledge rather than data-driven approaches. To address this, we introduce Spatio Temporal Agent-Based Graph Evolution Dynamics(STAGED) integrating ABM with deep learning to model intercellular communication, and its effect on the intracellular gene regulatory network. Using graph ODE networks (GDEs) with shared weights per cell type, our approach represents genes as vertices and interactions as directed edges, dynamically learning their strengths through a designed attention mechanism. Trained to match continuous trajectories of simulated as well as inferred trajectories from spatial transcriptomics data, the model captures both intercellular and intracellular interactions, enabling a more adaptive and accurate representation of cellular dynamics.

2025-07-15

ArXiv (prépublication)

SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics

Siddharth Viswanath

Rahul Singh

Yanlei Zhang

J. Adam Noah

Joy Hirsch

Graph neural networks have been useful in machine learning on graph-structured data, particularly for node classification and some types of … (voir plus)graph classification tasks. However, they have had limited use in representing patterning of signals over graphs. Patterning of signals over graphs and in subgraphs carries important information in many domains including neuroscience. Neural signals are spatiotemporally patterned, high dimensional and difficult to decode. Graph signal processing and associated GCN models utilize the graph Fourier transform and are unable to efficiently represent spatially or spectrally localized signal patterning on graphs. Wavelet transforms have shown promise here, but offer non-canonical representations and cannot be tightly confined to subgraphs. Here we propose SlepNet, a novel GCN architecture that uses Slepian bases rather than graph Fourier harmonics. In SlepNet, the Slepian harmonics optimally concentrate signal energy on specifically relevant subgraphs that are automatically learned with a mask. Thus, they can produce canonical and highly resolved representations of neural activity, focusing energy of harmonics on areas of the brain which are activated. We evaluated SlepNet across three fMRI datasets, spanning cognitive and visual tasks, and two traffic dynamics datasets, comparing its performance against conventional GNNs and graph signal processing constructs. SlepNet outperforms the baselines in all datasets. Moreover, the extracted representations of signal patterns from SlepNet offers more resolution in distinguishing between similar patterns, and thus represent brain signaling transients as informative trajectories. Here we have shown that these extracted trajectory representations can be used for other downstream untrained tasks. Thus we establish that SlepNet is useful both for prediction and representation learning in spatiotemporal data.

2025-06-19

ArXiv (prépublication)

SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics

Siddharth Viswanath

Rahul Singh

Yanlei Zhang

J. Adam Noah

Joy Hirsch

Graph neural networks have been useful in machine learning on graph-structured data, particularly for node classification and some types of … (voir plus)graph classification tasks. However, they have had limited use in representing patterning of signals over graphs. Patterning of signals over graphs and in subgraphs carries important information in many domains including neuroscience. Neural signals are spatiotemporally patterned, high dimensional and difficult to decode. Graph signal processing and associated GCN models utilize the graph Fourier transform and are unable to efficiently represent spatially or spectrally localized signal patterning on graphs. Wavelet transforms have shown promise here, but offer non-canonical representations and cannot be tightly confined to subgraphs. Here we propose SlepNet, a novel GCN architecture that uses Slepian bases rather than graph Fourier harmonics. In SlepNet, the Slepian harmonics optimally concentrate signal energy on specifically relevant subgraphs that are automatically learned with a mask. Thus, they can produce canonical and highly resolved representations of neural activity, focusing energy of harmonics on areas of the brain which are activated. We evaluated SlepNet across three fMRI datasets, spanning cognitive and visual tasks, and two traffic dynamics datasets, comparing its performance against conventional GNNs and graph signal processing constructs. SlepNet outperforms the baselines in all datasets. Moreover, the extracted representations of signal patterns from SlepNet offers more resolution in distinguishing between similar patterns, and thus represent brain signaling transients as informative trajectories. Here we have shown that these extracted trajectory representations can be used for other downstream untrained tasks. Thus we establish that SlepNet is useful both for prediction and representation learning in spatiotemporal data.

2025-06-19

ArXiv (prépublication)

HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

Hiren Madhu

João Felipe Rocha

Tinglin Huang

Siddharth Viswanath

Rex Ying

Single-cell transcriptomics and proteomics have become a great source for data-driven insights into biology, enabling the use of advanced de… (voir plus)ep learning methods to understand cellular heterogeneity and gene expression at the single-cell level. With the advent of spatial-omics data, we have the promise of characterizing cells within their tissue context as it provides both spatial coordinates and intra-cellular transcriptional or protein counts. Proteomics offers a complementary view by directly measuring proteins, which are the primary effectors of cellular function and key therapeutic targets. However, existing models either ignore the spatial information or the complex genetic and proteomic programs within cells. Thus they cannot infer how cell internal regulation adapts to microenvironmental cues. Furthermore, these models often utilize fixed gene vocabularies, hindering their generalizability unseen genes. In this paper, we introduce HEIST, a hierarchical graph transformer foundation model for spatial transcriptomics and proteomics. HEIST models tissues as hierarchical graphs. The higher level graph is a spatial cell graph, and each cell in turn, is represented by its lower level gene co-expression network graph. HEIST achieves this by performing both intra-level and cross-level message passing to utilize the hierarchy in its embeddings and can thus generalize to novel datatypes including spatial proteomics without retraining. HEIST is pretrained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive and masked autoencoding objectives. Unsupervised analysis of HEIST embeddings reveals spatially informed subpopulations missed by prior models. Downstream evaluations demonstrate generalizability to proteomics data and state-of-the-art performance in clinical outcome prediction, cell type annotation, and gene imputation across multiple technologies.

2025-06-11

ArXiv (prépublication)

ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties

Kevin Bijan Givechian

João Felipe Rocha

Edward Yang

Chen Liu

Kerrie Greene

Rex Ying

Etienne Caron

Akiko Iwasaki

2025-05-21

Research Square (publié)

Neurospectrum: A Geometric and Topological Deep Learning Framework for Uncovering Spatiotemporal Signatures in Neural Activity

Dhananjay Bhaskar

Yanlei Zhang

Jessica Moore

Feng Gao

Bastian Rieck

Guy Wolf

Firas Khasawneh

Elizabeth Munch

J. Adam Noah

Helen Pushkarskaya

Christopher Pittenger

Valentina Greco

2025-05-08

bioRxiv (prépublication)

Neurospectrum: A Geometric and Topological Deep Learning Framework for Uncovering Spatiotemporal Signatures in Neural Activity

Dhananjay Bhaskar

Jessica Moore

Feng Gao

Bastian Rieck

Firas Khasawneh

Elizabeth Munch

Valentina Greco

Neural signals are high-dimensional, noisy, and dynamic, making it challenging to extract interpretable features linked to behavior or disea… (voir plus)se. We introduce Neurospectrum, a framework that encodes neural activity as latent trajectories shaped by spatial and temporal structure. At each timepoint, signals are represented on a graph capturing spatial relationships, with a learnable attention mechanism highlighting important regions. These are embedded using graph wavelets and passed through a manifold-regularized autoencoder that preserves temporal geometry. The resulting latent trajectory is summarized using a principled set of descriptors - including curvature, path signatures, persistent homology, and recurrent networks -that capture multiscale geometric, topological, and dynamical features. These features drive downstream prediction in a modular, interpretable, and end-to-end trainable framework. We evaluate Neurospectrum on simulated and experimental datasets. It tracks phase synchronization in Kuramoto simulations, reconstructs visual stimuli from calcium imaging, and identifies biomarkers of obsessive-compulsive disorder in fMRI. Across tasks, Neurospectrum uncovers meaningful neural dynamics and outperforms traditional analysis methods.

2025-05-08

bioRxiv (prépublication)

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

Neil He

Rishabh Anand

Hiren Madhu

Ali Maatouk

Leandros Tassiulas

Menglin Yang 0001

Rex Ying

2025-05-01

arXiv (publié)

ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties

Kevin Bijan Givechian

João Felipe Rocha

Edward Yang

Chen Liu

Kerrie Greene

Rex Ying

Etienne Caron

Akiko Iwasaki

2025-04-30

bioRxiv (prépublication)

InfoGain Wavelets: Furthering the Design of Graph Diffusion Wavelets

David R. Johnson

Michael Perlmutter

Diffusion wavelets extract information from graph signals at different scales of resolution by utilizing graph diffusion operators raised to… (voir plus) various powers, known as diffusion scales. Traditionally, these scales are chosen to be dyadic integers,

2025-04-08

ArXiv (prépublication)

InfoGain Wavelets: Furthering the Design of Diffusion Wavelets for Graph-Structured Data

David R. Johnson

Michael Perlmutter

2025-04-08

ArXiv (prépublication)