Portrait de Smita Krishnaswamy

Smita Krishnaswamy

Membre affilié
Professeure associée, Yale University
Université de Montréal
Yale
Sujets de recherche
Apprentissage de représentations
Apprentissage profond
Apprentissage profond géométrique
Apprentissage spectral
Apprentissage sur variétés
Biologie computationnelle
Géométrie des données
IA en santé
Interfaces cerveau-ordinateur
Modèles génératifs
Modélisation moléculaire
Neurosciences computationnelles
Parcimonie des données
Réseaux de neurones en graphes
Science cognitive
Science des données
Systèmes dynamiques
Théorie de l'information

Biographie

Notre laboratoire travaille sur le développement de méthodes mathématiques fondamentales d'apprentissage automatique et d'apprentissage profond qui intègrent l'apprentissage basé sur les graphes, le traitement du signal, la théorie de l'information, la géométrie et la topologie des données, le transport optimal et la modélisation dynamique qui sont capables d'effectuer une analyse exploratoire, une inférence scientifique, une interprétation et une génération d'hypothèses de grands ensembles de données biomédicales allant des données de cellules uniques, à l'imagerie cérébrale, aux ensembles de données structurelles moléculaires provenant des neurosciences, de la psychologie, de la biologie des cellules souches, de la biologie du cancer, des soins de santé, et de la biochimie. Nos travaux ont été déterminants pour l'apprentissage de trajectoires dynamiques à partir de données instantanées statiques, le débruitage des données, la visualisation, l'inférence de réseaux, la modélisation de structures moléculaires et bien d'autres choses encore.

Étudiants actuels

Collaborateur·rice de recherche - Yale University
Superviseur⋅e principal⋅e :

Publications

Revealing dynamic temporal trajectories and underlying regulatory networks with
<i>Cflows</i>
Manik Kuchroo
Shabarni Gupta
Aarthi Venkat
Chen Liu
Beatriz P. San Juan
Laura Rangel
Brandon Zhu
John G. Lock
Christine L. Chaffer
While single-cell technologies provide snapshots of tumor states, building continuous trajectories and uncovering causative gene regulatory … (voir plus)networks remains a significant challenge. We present Cflows , an AI framework that combines neural ODE networks with Granger causality to infer continuous cell state transitions and gene regulatory interactions from static scRNA-seq data. In a new 5-time point dataset capturing tumorsphere development over 30 days, Cflows reconstructs two types of trajectories leading to tumorsphere formation or apoptosis. Trajectory-based cell-of-origin analysis delineated a novel cancer stem cell profile characterized by CD44 hi EPCAM + CAV1 + , and uncovered a cell cycle–dependent enrichment of tumorsphere-initiating potential in G2/M or S-phase cells. Cflows uncovers ESRRA as a crucial causal driver of the tumor-forming gene regulatory network. Indeed, ESRRA inhibition significantly reduces tumor growth and metastasis in vivo. Cflows offers a powerful framework for uncovering cellular transitions and dynamic regulatory networks from static single-cell data.
Manifold Filter-Combine Networks
Joyce Chew
Edward De Brouwer
Deanna Needell
Michael Perlmutter
In order to better understand manifold neural networks (MNNs), we introduce Manifold Filter-Combine Networks (MFCNs). Our filter-combine fra… (voir plus)mework parallels the popular aggregate-combine paradigm for graph neural networks (GNNs) and naturally suggests many interesting families of MNNs which can be interpreted as manifold analogues of various popular GNNs. We propose a method for implementing MFCNs on high-dimensional point clouds that relies on approximating an underlying manifold by a sparse graph. We then prove that our method is consistent in the sense that it converges to a continuum limit as the number of data points tends to infinity, and we numerically demonstrate its effectiveness on real-world and synthetic data sets.
CellForge: Agentic Design of Virtual Cell Models
Xiangru Tang
Zhuoyun Yu
Jiapeng Chen
Yan Cui
Yanjun Shao
Weixu Wang
Fang Wu
Yuchen Zhuang
Wenqi Shi
Zhi Huang
Arman Cohan
Xihong Lin
Fabian Theis
Mark B. Gerstein
Virtual cell modeling represents an emerging frontier at the intersection of artificial intelligence and biology, aiming to predict quantiti… (voir plus)es such as responses to diverse perturbations quantitatively. However, autonomously building computational models for virtual cells is challenging due to the complexity of biological systems, the heterogeneity of data modalities, and the need for domain-specific expertise across multiple disciplines. Here, we introduce CellForge, an agentic system that leverages a multi-agent framework that transforms presented biological datasets and research objectives directly into optimized computational models for virtual cells. More specifically, given only raw single-cell multi-omics data and task descriptions as input, CellForge outputs both an optimized model architecture and executable code for training virtual cell models and inference. The framework integrates three core modules: Task Analysis for presented dataset characterization and relevant literature retrieval, Method Design, where specialized agents collaboratively develop optimized modeling strategies, and Experiment Execution for automated generation of code. The agents in the Design module are separated into experts with differing perspectives and a central moderator, and have to collaboratively exchange solutions until they achieve a reasonable consensus. We demonstrate CellForge's capabilities in single-cell perturbation prediction, using six diverse datasets that encompass gene knockouts, drug treatments, and cytokine stimulations across multiple modalities. CellForge consistently outperforms task-specific state-of-the-art methods. Overall, CellForge demonstrates how iterative interaction between LLM agents with differing perspectives provides better solutions than directly addressing a modeling challenge. Our code is publicly available at https://github.com/gersteinlab/CellForge.
STAGED: A Multi-Agent Neural Network for Learning Cellular Interaction Dynamics
João Felipe Rocha
Ke Xu
Xingzhi Sun
Ananya Krishna
Dhananjay Bhaskar
Blanche Mongeon
Morgan Craig
Mark B. Gerstein
The advent of single-cell technology has significantly improved our understanding of cellular states and subpopulations in various tissues u… (voir plus)nder normal and diseased conditions by employing data-driven approaches such as clustering and trajectory inference. However, these methods consider cells as independent data points of population distributions. With spatial transcriptomics, we can represent cellular organization, along with dynamic cell-cell interactions that lead to changes in cell state. Still, key computational advances are necessary to enable the data-driven learning of such complex interactive cellular dynamics. While agent-based modeling (ABM) provides a powerful framework, traditional approaches rely on handcrafted rules derived from domain knowledge rather than data-driven approaches. To address this, we introduce Spatio Temporal Agent-Based Graph Evolution Dynamics(STAGED) integrating ABM with deep learning to model intercellular communication, and its effect on the intracellular gene regulatory network. Using graph ODE networks (GDEs) with shared weights per cell type, our approach represents genes as vertices and interactions as directed edges, dynamically learning their strengths through a designed attention mechanism. Trained to match continuous trajectories of simulated as well as inferred trajectories from spatial transcriptomics data, the model captures both intercellular and intracellular interactions, enabling a more adaptive and accurate representation of cellular dynamics.
SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics
Rahul Singh
Yanlei Zhang
J. Adam Noah
Joy Hirsch
Graph neural networks have been useful in machine learning on graph-structured data, particularly for node classification and some types of … (voir plus)graph classification tasks. However, they have had limited use in representing patterning of signals over graphs. Patterning of signals over graphs and in subgraphs carries important information in many domains including neuroscience. Neural signals are spatiotemporally patterned, high dimensional and difficult to decode. Graph signal processing and associated GCN models utilize the graph Fourier transform and are unable to efficiently represent spatially or spectrally localized signal patterning on graphs. Wavelet transforms have shown promise here, but offer non-canonical representations and cannot be tightly confined to subgraphs. Here we propose SlepNet, a novel GCN architecture that uses Slepian bases rather than graph Fourier harmonics. In SlepNet, the Slepian harmonics optimally concentrate signal energy on specifically relevant subgraphs that are automatically learned with a mask. Thus, they can produce canonical and highly resolved representations of neural activity, focusing energy of harmonics on areas of the brain which are activated. We evaluated SlepNet across three fMRI datasets, spanning cognitive and visual tasks, and two traffic dynamics datasets, comparing its performance against conventional GNNs and graph signal processing constructs. SlepNet outperforms the baselines in all datasets. Moreover, the extracted representations of signal patterns from SlepNet offers more resolution in distinguishing between similar patterns, and thus represent brain signaling transients as informative trajectories. Here we have shown that these extracted trajectory representations can be used for other downstream untrained tasks. Thus we establish that SlepNet is useful both for prediction and representation learning in spatiotemporal data.
ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties
Kevin Bijan Givechian
João Felipe Rocha
Edward Yang
Chen Liu
Kerrie Greene
Rex Ying
Etienne Caron
Akiko Iwasaki
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He
Rishabh Anand
Hiren Madhu
Ali Maatouk
Leandros Tassiulas
Menglin Yang 0001
Rex Ying
Recovering undersampled single-cell transcriptomes with HyperCell
Abstract

Single-cell transcriptomic technology has now matured, allowing quantification of mRNA transcripts corres… (voir plus)ponding to tens of thousands of genes within a cell. However, still only a small fraction of these mRNA is captured and measured by today’s single-cell assays. There are likely hundreds of thousands of mRNA copies present within a typical human cell, yet these assays omit a majority of the transcripts that are actually present. This introduces technical noise, especially non-biological variability and excessive sparsity, which frustrates downstream analysis and potentially skews biological conclusions. To overcome these challenges, we here develop HyperCell, a probabilistic deep learning approach that explicitly models this undersampling to produce estimates of each cell’s original gene transcript abundances across the whole transcriptome. We demonstrate that our framework offers benefits in various mRNA modeling settings, by i) correctly differentiating between spurious sampling-induced and real biological zeros, outperforming existing approaches, ii) estimating the total mRNA content of cells across states to reduce contamination due to background transcripts, iii) reducing contamination due to background transcripts, and iv) helping to counteract biases that may appear during typical differential gene expression analyses using widespread normalization approaches. Our approach to correcting for the technical noise introduced by the single-cell experimental process brings us closer to studying biology, starting from the true transcriptome of cells.

Deep Learning Unlocks the True Potential of Organ Donation after Circulatory Death with Accurate Prediction of Time-to-Death
Xingzhi Sun
Edward De Brouwer
Chen Liu
Ramesh Batra
𝟏
Increasing the number of organ donations after circulatory death (DCD) has been identified as one of the most important ways of addressing t… (voir plus)he ongoing organ shortage. While recent technological advances in organ transplantation have increased their success rate, a substantial challenge in increasing the number of DCD donations resides in the uncertainty regarding the timing of cardiac death after terminal extubation, impacting the risk of prolonged ischemic organ injury, and negatively affecting post-transplant outcomes. In this study, we trained and externally validated an ODE-RNN model, which combines recurrent neural network with neural ordinary equations and excels in processing irregularly-sampled time series data. The model is designed to predict time-to-death following terminal extubation in the intensive care unit (ICU) using the last 24 hours of clinical observations. Our model was trained on a cohort of 3,238 patients from Yale New Haven Hospital, and validated on an external cohort of 1,908 patients from six hospitals across Connecticut. The model achieved accuracies of 95.3 {+/-} 1.0% and 95.4 {+/-} 0.7% for predicting whether death would occur in the first 30 and 60 minutes, respectively, with a calibration error of 0.024 {+/-} 0.009. Heart rate, respiratory rate, mean arterial blood pressure (MAP), oxygen saturation (SpO2), and Glasgow Coma Scale (GCS) scores were identified as the most important predictors. Surpassing existing clinical scores, our model sets the stage for reduced organ acquisition costs and improved post-transplant outcomes.
InfoGain Wavelets: Furthering the Design of Diffusion Wavelets for Graph-Structured Data
David R. Johnson
Michael Perlmutter
Diffusion wavelets extract information from graph signals at different scales of resolution by utilizing graph diffusion operators raised to… (voir plus) various powers, known as diffusion scales. Traditionally, the diffusion scales are chosen to be dyadic integers,
DiffKillR: Killing and Recreating Diffeomorphisms for Cell Annotation in Dense Microscopy Images
Chen Liu
Danqi Liao
Alejandro Parada-Mayorga
Alejandro Ribeiro
Marcello DiStasio
The proliferation of digital microscopy images, driven by advances in automated whole slide scanning, presents significant opportunities for… (voir plus) biomedical research and clinical diagnostics. However, accurately annotating densely packed information in these images remains a major challenge. To address this, we introduce DiffKillR, a novel framework that reframes cell annotation as the combination of archetype matching and image registration tasks. DiffKillR employs two complementary neural networks: one that learns a diffeomorphism-invariant feature space for robust cell matching and another that computes the precise warping field between cells for annotation mapping. Using a small set of annotated archetypes, DiffKillR efficiently propagates annotations across large microscopy images, reducing the need for extensive manual labeling. More importantly, it is suitable for any type of pixel-level annotation. We will discuss the theoretical properties of DiffKillR and validate it on three microscopy tasks, demonstrating its advantages over existing supervised, semi-supervised, and unsupervised methods.
Hyperedge Representations with Hypergraph Wavelets: Applications to Spatial Transcriptomics
Xingzhi Sun
Charles Xu
João Felipe Rocha
Chen Liu
Benjamin Hollander-Bodie
Laney Goldman
Marcello DiStasio
Michael Perlmutter
In many data-driven applications, higher-order relationships among multiple objects are essential in capturing complex interactions. Hypergr… (voir plus)aphs, which generalize graphs by allowing edges to connect any number of nodes, provide a flexible and powerful framework for modeling such higher-order relationships. In this work, we introduce hypergraph diffusion wavelets and describe their favorable spectral and spatial properties. We demonstrate their utility for biomedical discovery in spatially resolved transcriptomics by applying the method to represent disease-relevant cellular niches for Alzheimer’s disease.