Portrait of Smita Krishnaswamy

Smita Krishnaswamy

Affiliate Member
Associate Professor, Yale University
Université de Montréal
Yale
Research Topics
AI in Health
Brain-computer Interfaces
Cognitive Science
Computational Biology
Computational Neuroscience
Data Geometry
Data Science
Data Sparsity
Deep Learning
Dynamical Systems
Generative Models
Geometric Deep Learning
Graph Neural Networks
Information Theory
Manifold Learning
Molecular Modeling
Representation Learning
Spectral Learning

Biography

Our lab works on developing foundational mathematical machine learning and deep learning methods that incorporate graph-based learning, signal processing, information theory, data geometry and topology, optimal transport and dynamics modeling that are capable of exploratory analysis, scientific inference, interpretation and hypothesis generation big biomedical datasets ranging from single-cell data, to brain imaging, to molecular structural datasets arising from neuroscience, psychology, stem cell biology, cancer biology, healthcare, and biochemistry. Our works have been instrumental in dynamic trajectory learning from static snapshot data, data denoising, visualization, network inference, molecular structure modeling and more.

Current Students

Collaborating researcher - Yale University
Principal supervisor :

Publications

DiffKillR: Killing and Recreating Diffeomorphisms for Cell Annotation in Dense Microscopy Images
Chen Liu
Danqi Liao
Alejandro Parada-Mayorga
Alejandro Ribeiro
Marcello DiStasio
The proliferation of digital microscopy images, driven by advances in automated whole slide scanning, presents significant opportunities for… (see more) biomedical research and clinical diagnostics. However, accurately annotating densely packed information in these images remains a major challenge. To address this, we introduce DiffKillR, a novel framework that reframes cell annotation as the combination of archetype matching and image registration tasks. DiffKillR employs two complementary neural networks: one that learns a diffeomorphism-invariant feature space for robust cell matching and another that computes the precise warping field between cells for annotation mapping. Using a small set of annotated archetypes, DiffKillR efficiently propagates annotations across large microscopy images, reducing the need for extensive manual labeling. More importantly, it is suitable for any type of pixel-level annotation. We will discuss the theoretical properties of DiffKillR and validate it on three microscopy tasks, demonstrating its advantages over existing supervised, semi-supervised, and unsupervised methods.
Hyperedge Representations with Hypergraph Wavelets: Applications to Spatial Transcriptomics
Xingzhi Sun
Charles Xu
João Felipe Rocha
Chen Liu
Benjamin Hollander-Bodie
Laney Goldman
Marcello DiStasio
Michael Perlmutter
In many data-driven applications, higher-order relationships among multiple objects are essential in capturing complex interactions. Hypergr… (see more)aphs, which generalize graphs by allowing edges to connect any number of nodes, provide a flexible and powerful framework for modeling such higher-order relationships. In this work, we introduce hypergraph diffusion wavelets and describe their favorable spectral and spatial properties. We demonstrate their utility for biomedical discovery in spatially resolved transcriptomics by applying the method to represent disease-relevant cellular niches for Alzheimer’s disease.
Latent Representation Learning for Multimodal Brain Activity Translation
Dhananjay Bhaskar
Erica Lindsey Busch
Laurent Caplette
Rahul Singh
Nicholas B Turk-Browne
Neuroscience employs diverse neuroimaging techniques, each offering distinct insights into brain activity, from electrophysiological recordi… (see more)ngs such as EEG, which have high temporal resolution, to hemodynamic modalities such as fMRI, which have increased spatial precision. However, integrating these heterogeneous data sources remains a challenge, which limits a comprehensive understanding of brain function. We present the Spatiotemporal Alignment of Multimodal Brain Activity (SAMBA) framework, which bridges the spatial and temporal resolution gaps across modalities by learning a unified latent space free of modality-specific biases. SAMBA introduces a novel attention-based wavelet decomposition for spectral filtering of electrophysiological recordings, graph attention networks to model functional connectivity between functional brain units, and recurrent layers to capture temporal autocorrelations in brain signal. We show that the training of SAMBA, aside from achieving translation, also learns a rich representation of brain information processing. We showcase this classify external stimuli driving brain activity from the representation learned in hidden layers of SAMBA, paving the way for broad downstream applications in neuroscience research and clinical contexts.
Principal Curvatures Estimation with Applications to Single Cell Data
Yanlei Zhang
Xingzhi Sun
Charles Xu
Kincaid MacDonald
Dhananjay Bhaskar
Bastian Rieck
Accelerated learning of a noninvasive human brain-computer interface via manifold geometry
Erica Lindsey Busch
E. Chandra Fincke
Nicholas B Turk-Browne
Deep multimodal representations and classification of first-episode psychosis via live face processing
Rahul Singh
Yanlei Zhang
Dhananjay Bhaskar
Vinod Srihari
Cenk Tek
Xian Zhang
J. Adam Noah
Joy Hirsch
Schizophrenia is a severe psychiatric disorder associated with a wide range of cognitive and neurophysiological dysfunctions and long-term s… (see more)ocial difficulties. In this paper, we test the hypothesis that integration of multiple simultaneous acquisitions of neuroimaging, behavioral, and clinical information will be better for prediction of early psychosis than unimodal recordings. We propose a novel framework to investigate the neural underpinnings of the early psychosis symptoms (that can develop into Schizophrenia with age) using multimodal acquisitions of neural and behavioral recordings including functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG), and facial features. Our data acquisition paradigm is based on live face-to-face interaction in order to study the neural correlates of social cognition in first-episode psychosis (FEP). We propose a novel deep representation learning framework, Neural-PRISM, for learning joint multimodal compressed representations combining neural as well as behavioral recordings. These learned representations are subsequently used to describe, classify, and predict the severity of early psychosis in patients, as measured by the Positive and Negative Syndrome Scale (PANSS) and Global Assessment of Functioning (GAF) scores. We found that incorporating joint multimodal representations from fNIRS and EEG along with behavioral recordings enhances classification between typical controls and FEP individuals. Additionally, our results suggest that geometric and topological features such as curvatures and path signatures of the embedded trajectories of brain activity enable detection of discriminatory neural characteristics in early psychosis.
HiPoNet: A Multi-View Simplicial Complex Network for High Dimensional Point-Cloud and Single-Cell Data
Hiren Madhu
Dhananjay Bhaskar
David R. Johnson
Rex Ying
Christopher Tape
Ian Adelstein
Michael Perlmutter
In this paper, we propose HiPoNet, an end-to-end differentiable neural network for regression, classification, and representation learning o… (see more)n high-dimensional point clouds. Our work is motivated by single-cell data which can have very high-dimensionality --exceeding the capabilities of existing methods for point clouds which are mostly tailored for 3D data. Moreover, modern single-cell and spatial experiments now yield entire cohorts of datasets (i.e., one data set for every patient), necessitating models that can process large, high-dimensional point-clouds at scale. Most current approaches build a single nearest-neighbor graph, discarding important geometric and topological information. In contrast, HiPoNet models the point-cloud as a set of higher-order simplicial complexes, with each particular complex being created using a reweighting of features. This method thus generates multiple constructs corresponding to different views of high-dimensional data, which in biology offers the possibility of disentangling distinct cellular processes. It then employs simplicial wavelet transforms to extract multiscale features, capturing both local and global topology from each view. We show that geometric and topological information is preserved in this framework both theoretically and empirically. We showcase the utility of HiPoNet on point-cloud level tasks, involving classification and regression of entire point-clouds in data cohorts. Experimentally, we find that HiPoNet outperforms other point-cloud and graph-based models on single-cell data. We also apply HiPoNet to spatial transcriptomics datasets using spatial coordinates as one of the views. Overall, HiPoNet offers a robust and scalable solution for high-dimensional data analysis.
Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds
Xingzhi Sun
Danqi Liao
Kincaid MacDonald
Yanlei Zhang
Chen Liu
Ian Adelstein
Tim G. J. Rudner
Rapid growth of high-dimensional datasets in fields such as single-cell RNA sequencing and spatial genomics has led to unprecedented opportu… (see more)nities for scientific discovery, but it also presents unique computational and statistical challenges. Traditional methods struggle with geometry-aware data generation, interpolation along meaningful trajectories, and transporting populations via feasible paths. To address these issues, we introduce Geometry-Aware Generative Autoencoder (GAGA), a novel framework that combines extensible manifold learning with generative modeling. GAGA constructs a neural network embedding space that respects the intrinsic geometries discovered by manifold learning and learns a novel warped Riemannian metric on the data space. This warped metric is derived from both the points on the data manifold and negative samples off the manifold, allowing it to characterize a meaningful geometry across the entire latent space. Using this metric, GAGA can uniformly sample points on the manifold, generate points along geodesics, and interpolate between populations across the learned manifold using geodesic-guided flows. GAGA shows competitive performance in simulated and real-world datasets, including a 30% improvement over the state-of-the-art methods in single-cell population-level trajectory inference.
Mapping the gene space at single-cell resolution with gene signal pattern analysis
Aarthi Venkat
Martina Damo
Samuel Leone
Scott E. Youlten
Nikhil S. Joshi
Eric Fagerberg
John Attanasio
Michael Perlmutter
In single-cell sequencing analysis, several computational methods have been developed to map the cellular state space, but little has been d… (see more)one to map or create embeddings of the gene space. Here, we formulate the gene embedding problem, design tasks with simulated single-cell data to evaluate representations, and establish ten relevant baselines. We then present a graph signal processing approach we call gene signal pattern analysis (GSPA) that learns rich gene representations from single-cell data using a dictionary of diffusion wavelets on the cell-cell graph. GSPA enables characterization of genes based on their patterning on the cellular manifold. It also captures how localized or diffuse the expression of a gene is, for which we present a score called the gene localization score. We motivate and demonstrate the efficacy of GSPA as a framework for a range of biological tasks, such as capturing gene coexpression modules, condition-specific enrichment, and perturbation-specific gene-gene interactions. Then, we showcase the broad utility of gene rep-resentations derived from GSPA, including for cell-cell communication (GSPA-LR), spatial transcriptomics (GSPA-multimodal), and patient response (GSPA-Pt) analysis.
Beta cells are essential drivers of pancreatic ductal adenocarcinoma development
Cathy C. Garcia
Aarthi Venkat
Daniel C. McQuaid
Sherry Agabiti
Rebecca L. Cardone
Rebecca Starble
Akin Sogunro
Jeremy B. Jacox
Christian F. Ruiz
Richard G. Kibbey
Mandar Deepak Muzumdar
Pancreatic endocrine-exocrine crosstalk plays a key role in normal physiology and disease. For instance, endocrine islet beta (β) cell secr… (see more)etion of insulin or cholecystokinin (CCK) promotes progression of pancreatic adenocarcinoma (PDAC), an exocrine cell-derived tumor. However, the cellular and molecular mechanisms that govern endocrine-exocrine signaling in tumorigenesis remain incompletely understood. We find that β cell ablation impedes PDAC development in mice, arguing that the endocrine pancreas is critical for exocrine tumorigenesis. Conversely, obesity induces β cell hormone dysregulation, alters CCK-dependent peri-islet exocrine cell transcriptional states, and enhances islet proximal tumor formation. Single-cell RNA-sequencing, in silico latent-space archetypal and trajectory analysis, and genetic lineage tracing in vivo reveal that obesity stimulates postnatal immature β cell expansion and adaptation towards a pro-tumorigenic CCK+ state via JNK/cJun stress-responsive signaling. These results define endocrine-exocrine signaling as a driver of PDAC development and uncover new avenues to target the endocrine pancreas to subvert exocrine tumorigenesis.
Exploring the Manifold of Neural Networks Using Diffusion Geometry
Elliott Abel
Peyton Crevasse
Yvan Grinspan
Selma Mazioud
Folu Ogundipe
Kristof Reimann
Ellie Schueler
Andrew J. Steindl
Ellen Zhang
Dhananjay Bhaskar
Yanlei Zhang
Tim G. J. Rudner
Ian Adelstein
Drawing motivation from the manifold hypothesis, which posits that most high-dimensional data lies on or near low-dimensional manifolds, we … (see more)apply manifold learning to the space of neural networks. We learn manifolds where datapoints are neural networks by introducing a distance between the hidden layer representations of the neural networks. These distances are then fed to the non-linear dimensionality reduction algorithm PHATE to create a manifold of neural networks. We characterize this manifold using features of the representation, including class separation, hierarchical cluster structure, spectral entropy, and topological structure. Our analysis reveals that high-performing networks cluster together in the manifold, displaying consistent embedding patterns across all these features. Finally, we demonstrate the utility of this approach for guiding hyperparameter optimization and neural architecture search by sampling from the manifold.
ImmunoStruct: Integration of protein sequence, structure, and biochemical properties for immunogenicity prediction and interpretation
Kevin Bijan Givechian
João Felipe Rocha
Edward Yang
Chen Liu
Kerrie Greene
Rex Ying
Etienne Caron
Akiko Iwasaki