Portrait of Smita Krishnaswamy

Smita Krishnaswamy

Affiliate Member
Associate Professor, Yale University
Université de Montréal
Yale
Research Topics
AI in Health
Brain-computer Interfaces
Cognitive Science
Computational Biology
Computational Neuroscience
Data Geometry
Data Science
Data Sparsity
Deep Learning
Dynamical Systems
Generative Models
Geometric Deep Learning
Graph Neural Networks
Information Theory
Manifold Learning
Molecular Modeling
Representation Learning
Spectral Learning

Biography

Our lab works on developing foundational mathematical machine learning and deep learning methods that incorporate graph-based learning, signal processing, information theory, data geometry and topology, optimal transport and dynamics modeling that are capable of exploratory analysis, scientific inference, interpretation and hypothesis generation big biomedical datasets ranging from single-cell data, to brain imaging, to molecular structural datasets arising from neuroscience, psychology, stem cell biology, cancer biology, healthcare, and biochemistry. Our works have been instrumental in dynamic trajectory learning from static snapshot data, data denoising, visualization, network inference, molecular structure modeling and more.

Publications

AAnet resolves a continuum of spatially-localized cell states to unveil tumor complexity
Aarthi Venkat
Scott E. Youlten
Beatriz P. San Juan
Carley Purcell
Matthew Amodio
Daniel B. Burkhardt
Andrew Benz
Jeff Holst
Cerys McCool
Annelie Mollbrink
Joakim Lundeberg
David van Dijk
Leonard D. Goldstein
Sarah Kummerfeld
Christine L. Chaffer
Identifying functionally important cell states and structure within a heterogeneous tumor remains a significant biological and computational… (see more) challenge. Moreover, current clustering or trajectory-based computational models are ill-equipped to address the notion that cancer cells reside along a phenotypic continuum. To address this, we present Archetypal Analysis network (AAnet), a neural network that learns key archetypal cell states within a phenotypic continuum of cell states in single-cell data. Applied to single-cell RNA sequencing data from pre-clinical models and a cohort of 34 clinical breast cancers, AAnet identifies archetypes that resolve distinct biological cell states and processes, including cell proliferation, hypoxia, metabolism and immune interactions. Notably, archetypes identified in primary tumors are recapitulated in matched liver, lung and lymph node metastases, demonstrating that a significant component of intratumoral heterogeneity is driven by cell intrinsic properties. Using spatial transcriptomics as orthogonal validation, AAnet-derived archetypes show discrete spatial organization within tumors, supporting their distinct archetypal biology. We further reveal that ligand:receptor cross-talk between cancer and adjacent stromal cells contributes to intra-archetypal biological mimicry. Finally, we use AAnet archetype identifiers to validate GLUT3 as a critical mediator of a hypoxic cell archetype harboring a cancer stem cell population, which we validate in human triple-negative breast cancer specimens. AAnet is a powerful tool to reveal functional cell states within complex samples from multimodal single-cell data.
BLIS-Net: Classifying and Analyzing Signals on Graphs
Charles Xu
Laney Goldman
Valentina Guo
Benjamin Hollander-Bodie
Maedee Trank-Greene
Ian Adelstein
Edward De Brouwer
Rex Ying
Michael Perlmutter
Graph neural networks (GNNs) have emerged as a powerful tool for tasks such as node classification and graph classification. However, much l… (see more)ess work has been done on signal classification, where the data consists of many functions (referred to as signals) defined on the vertices of a single graph. These tasks require networks designed differently from those designed for traditional GNN tasks. Indeed, traditional GNNs rely on localized low-pass filters, and signals of interest may have intricate multi-frequency behavior and exhibit long range interactions. This motivates us to introduce the BLIS-Net (Bi-Lipschitz Scattering Net), a novel GNN that builds on the previously introduced geometric scattering transform. Our network is able to capture both local and global signal structure and is able to capture both low-frequency and high-frequency information. We make several crucial changes to the original geometric scattering architecture which we prove increase the ability of our network to capture information about the input signal and show that BLIS-Net achieves superior performance on both synthetic and real-world data sets based on traffic flow and fMRI data.
Directed Scattering for Knowledge Graph-Based Cellular Signaling Analysis
Aarthi Venkat
Joyce Chew
Ferran Cardoso Rodriguez
Christopher J. Tape
Michael Perlmutter
Directed graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical … (see more)reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and multiscale structure. However, the majority of methods performing node- and edge-level tasks in machine learning do not take these properties into account, and thus have not been leveraged effectively for scientific tasks such as cellular signaling network inference. We propose a new framework called Directed Scattering Autoencoder (DSAE) which uses a directed version of a geometric scattering transform, combined with the non-linear dimensionality reduction properties of an autoencoder and the geometric properties of the hyperbolic space to learn latent hierarchies. We show this method outperforms numerous others on tasks such as embedding directed graphs and learning cellular signaling networks.
Bayesian Spectral Graph Denoising with Smoothness Prior
Samuel Leone
Xingzhi Sun
Michael Perlmutter
Here we consider the problem of denoising features associated to complex data, modeled as signals on a graph, via a smoothness prior. This i… (see more)s motivated in part by settings such as single-cell RNA where the data is very high-dimensional, but its structure can be captured via an affinity graph. This allows us to utilize ideas from graph signal processing. In particular, we present algorithms for the cases where the signal is perturbed by Gaussian noise, dropout, and uniformly distributed noise. The signals are assumed to follow a prior distribution defined in the frequency domain which favors signals which are smooth across the edges of the graph. By pairing this prior distribution with our three models of noise generation, we propose Maximum A Posteriori (M.A.P.) estimates of the true signal in the presence of noisy data and provide algorithms for computing the M.A.P. Finally, we demonstrate the algorithms’ ability to effectively restore signals from white noise on image data and from severe dropout in single-cell RNA sequence data.
Abstract B049: Pancreatic beta cell stress pathways drive pancreatic ductal adenocarcinoma development in obesity
Cathy C. Garcia
Aarthi Venkat
Sherry Agabiti
Lauren Lawres
Rebecca Cardone
Richard G. Kibbey
Mandar Deepak Muzumdar
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy
Danqi Liao
Chen Liu
Benjamin W Christensen
Maximilian Nickel
Ian Adelstein
Entropy and mutual information in neural networks provide rich information on the learning process, but they have proven difficult to comput… (see more)e reliably in high dimensions. Indeed, in noisy and high-dimensional data, traditional estimates in ambient dimensions approach a fixed entropy and are prohibitively hard to compute. To address these issues, we leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. Specifically, we define diffusion spectral entropy (DSE) in neural representations of a dataset as well as diffusion spectral mutual information (DSMI) between different variables representing data. First, we show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data that outperform classic Shannon entropy, nonparametric estimation, and mutual information neural estimation (MINE). We then study the evolution of representations in classification networks with supervised learning, self-supervision, or overfitting. We observe that (1) DSE of neural representations increases during training; (2) DSMI with the class label increases during generalizable learning but stays stagnant during overfitting; (3) DSMI with the input signal shows differing trends: on MNIST it increases, while on CIFAR-10 and STL-10 it decreases. Finally, we show that DSE can be used to guide better network initialization and that DSMI can be used to predict downstream classification accuracy across 962 models on ImageNet.
Learnable Filters for Geometric Scattering Modules
Dhananjay Bhaskar
Kincaid MacDonald
Jackson Grady
Michael Perlmutter
Low-Dimensional Embeddings of High-Dimensional Data: Algorithms and Applications (Dagstuhl Seminar 24122).
Dmitry Kobak
Fred Hamprecht
Gal Mishne
Sebastian Damrich
Inferring dynamic regulatory interaction graphs from time series data with perturbations
Dhananjay Bhaskar
Daniel Sumner Magruder
Edward De Brouwer
Matheo Morales
Aarthi Venkat
Graph topological property recovery with heat and wave dynamics-based features on graphs
Dhananjay Bhaskar
Yanlei Zhang
Charles Xu
Xingzhi Sun
Oluwadamilola Fasina
Maximilian Nickel
Michael Perlmutter
Neural FIM for learning Fisher information metrics from point cloud data
Oluwadamilola Fasina
Yanlei Zhang
Maximilian Nickel
Ian Adelstein
Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the under… (see more)lying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifold model for the data. Neural FIM creates an extensible metric space from discrete point cloud data such that information from the metric can inform us of manifold characteristics such as volume and geodesics. We demonstrate Neural FIM’s utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells).
Graph Fourier MMD for Signals on Graphs
While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little at… (see more)tention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel distance between distributions and signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes the difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called gene localization score which helps select genes for cellular state space characterization.