Portrait de Smita Krishnaswamy

Smita Krishnaswamy

Membre affilié
Professeure associée, Yale University
Université de Montréal
Yale
Sujets de recherche
Apprentissage de représentations
Apprentissage profond
Apprentissage profond géométrique
Apprentissage spectral
Apprentissage sur variétés
Biologie computationnelle
Géométrie des données
IA en santé
Interfaces cerveau-ordinateur
Modèles génératifs
Modélisation moléculaire
Neurosciences computationnelles
Parcimonie des données
Réseaux de neurones en graphes
Science cognitive
Science des données
Systèmes dynamiques
Théorie de l'information

Biographie

Notre laboratoire travaille sur le développement de méthodes mathématiques fondamentales d'apprentissage automatique et d'apprentissage profond qui intègrent l'apprentissage basé sur les graphes, le traitement du signal, la théorie de l'information, la géométrie et la topologie des données, le transport optimal et la modélisation dynamique qui sont capables d'effectuer une analyse exploratoire, une inférence scientifique, une interprétation et une génération d'hypothèses de grands ensembles de données biomédicales allant des données de cellules uniques, à l'imagerie cérébrale, aux ensembles de données structurelles moléculaires provenant des neurosciences, de la psychologie, de la biologie des cellules souches, de la biologie du cancer, des soins de santé, et de la biochimie. Nos travaux ont été déterminants pour l'apprentissage de trajectoires dynamiques à partir de données instantanées statiques, le débruitage des données, la visualisation, l'inférence de réseaux, la modélisation de structures moléculaires et bien d'autres choses encore.

Publications

Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds
Xingzhi Sun
Danqi Liao
Kincaid MacDonald
Yanlei Zhang
Chen Liu
Guillaume Huguet
Ian Adelstein
Tim G. J. Rudner
Mapping the gene space at single-cell resolution with gene signal pattern analysis
Aarthi Venkat
Martina Damo
Samuel Leone
Scott E. Youlten
Nikhil S. Joshi
Eric Fagerberg
John Attanasio
Michael Perlmutter
In single-cell sequencing analysis, several computational methods have been developed to map the cellular state space, but little has been d… (voir plus)one to map or create embeddings of the gene space. Here, we formulate the gene embedding problem, design tasks with simulated single-cell data to evaluate representations, and establish ten relevant baselines. We then present a graph signal processing approach we call gene signal pattern analysis (GSPA) that learns rich gene representations from single-cell data using a dictionary of diffusion wavelets on the cell-cell graph. GSPA enables characterization of genes based on their patterning on the cellular manifold. It also captures how localized or diffuse the expression of a gene is, for which we present a score called the gene localization score. We motivate and demonstrate the efficacy of GSPA as a framework for a range of biological tasks, such as capturing gene coexpression modules, condition-specific enrichment, and perturbation-specific gene-gene interactions. Then, we showcase the broad utility of gene rep-resentations derived from GSPA, including for cell-cell communication (GSPA-LR), spatial transcriptomics (GSPA-multimodal), and patient response (GSPA-Pt) analysis.
Beta cells are essential drivers of pancreatic ductal adenocarcinoma development
Cathy C. Garcia
Aarthi Venkat
Daniel C. McQuaid
Sherry Agabiti
Alex Tong
Rebecca L. Cardone
Rebecca Starble
Akin Sogunro
Jeremy B. Jacox
Christian F. Ruiz
Richard G. Kibbey
Mandar Deepak Muzumdar
Pancreatic endocrine-exocrine crosstalk plays a key role in normal physiology and disease. For instance, endocrine islet beta (β) cell secr… (voir plus)etion of insulin or cholecystokinin (CCK) promotes progression of pancreatic adenocarcinoma (PDAC), an exocrine cell-derived tumor. However, the cellular and molecular mechanisms that govern endocrine-exocrine signaling in tumorigenesis remain incompletely understood. We find that β cell ablation impedes PDAC development in mice, arguing that the endocrine pancreas is critical for exocrine tumorigenesis. Conversely, obesity induces β cell hormone dysregulation, alters CCK-dependent peri-islet exocrine cell transcriptional states, and enhances islet proximal tumor formation. Single-cell RNA-sequencing, in silico latent-space archetypal and trajectory analysis, and genetic lineage tracing in vivo reveal that obesity stimulates postnatal immature β cell expansion and adaptation towards a pro-tumorigenic CCK+ state via JNK/cJun stress-responsive signaling. These results define endocrine-exocrine signaling as a driver of PDAC development and uncover new avenues to target the endocrine pancreas to subvert exocrine tumorigenesis.
Exploring the Manifold of Neural Networks Using Diffusion Geometry
Elliott Abel
Peyton Crevasse
Yvan Grinspan
Selma Mazioud
Folu Ogundipe
Kristof Reimann
Ellie Schueler
Andrew J. Steindl
Ellen Zhang
Dhananjay Bhaskar
Siddharth Viswanath
Yanlei Zhang
Tim G. J. Rudner
Ian Adelstein
Drawing motivation from the manifold hypothesis, which posits that most high-dimensional data lies on or near low-dimensional manifolds, we … (voir plus)apply manifold learning to the space of neural networks. We learn manifolds where datapoints are neural networks by introducing a distance between the hidden layer representations of the neural networks. These distances are then fed to the non-linear dimensionality reduction algorithm PHATE to create a manifold of neural networks. We characterize this manifold using features of the representation, including class separation, hierarchical cluster structure, spectral entropy, and topological structure. Our analysis reveals that high-performing networks cluster together in the manifold, displaying consistent embedding patterns across all these features. Finally, we demonstrate the utility of this approach for guiding hyperparameter optimization and neural architecture search by sampling from the manifold.
Exploring the Manifold of Neural Networks Using Diffusion Geometry
Elliott Abel
Peyton Crevasse
Yvan Grinspan
Selma Mazioud
Folu Ogundipe
Kristof Reimann
Ellie Schueler
Andrew J. Steindl
Ellen Zhang
Dhananjay Bhaskar
Siddharth Viswanath
Yanlei Zhang
Tim G. J. Rudner
Ian Adelstein
Drawing motivation from the manifold hypothesis, which posits that most high-dimensional data lies on or near low-dimensional manifolds, we … (voir plus)apply manifold learning to the space of neural networks. We learn manifolds where datapoints are neural networks by introducing a distance between the hidden layer representations of the neural networks. These distances are then fed to the non-linear dimensionality reduction algorithm PHATE to create a manifold of neural networks. We characterize this manifold using features of the representation, including class separation, hierarchical cluster structure, spectral entropy, and topological structure. Our analysis reveals that high-performing networks cluster together in the manifold, displaying consistent embedding patterns across all these features. Finally, we demonstrate the utility of this approach for guiding hyperparameter optimization and neural architecture search by sampling from the manifold.
Deep Learning Unlocks the True Potential of Organ Donation after Circulatory Death with Accurate Prediction of Time-to-Death
Xingzhi Sun
Edward De Brouwer
Chen Liu
Ramesh Batra
𝟏
Increasing the number of organ donations after circulatory death (DCD) has been identified as one of the most important ways of addressing t… (voir plus)he ongoing organ shortage. While recent technological advances in organ transplantation have increased their success rate, a substantial challenge in increasing the number of DCD donations resides in the uncertainty regarding the timing of cardiac death after terminal extubation, impacting the risk of prolonged ischemic organ injury, and negatively affecting post-transplant outcomes. In this study, we trained and externally validated an ODE-RNN model, which combines recurrent neural network with neural ordinary equations and excels in processing irregularly-sampled time series data. The model is designed to predict time-to-death following terminal extubation in the intensive care unit (ICU) using the last 24 hours of clinical observations. Our model was trained on a cohort of 3,238 patients from Yale New Haven Hospital, and validated on an external cohort of 1,908 patients from six hospitals across Connecticut. The model achieved accuracies of 95.3 {+/-} 1.0% and 95.4 {+/-} 0.7% for predicting whether death would occur in the first 30 and 60 minutes, respectively, with a calibration error of 0.024 {+/-} 0.009. Heart rate, respiratory rate, mean arterial blood pressure (MAP), oxygen saturation (SpO2), and Glasgow Coma Scale (GCS) scores were identified as the most important predictors. Surpassing existing clinical scores, our model sets the stage for reduced organ acquisition costs and improved post-transplant outcomes.
Deep Learning Unlocks the True Potential of Organ Donation after Circulatory Death with Accurate Prediction of Time-to-Death
Xingzhi Sun
Edward De Brouwer
Chen Liu
Ramesh Batra
𝟏
Increasing the number of organ donations after circulatory death (DCD) has been identified as one of the most important ways of addressing t… (voir plus)he ongoing organ shortage. While recent technological advances in organ transplantation have increased their success rate, a substantial challenge in increasing the number of DCD donations resides in the uncertainty regarding the timing of cardiac death after terminal extubation, impacting the risk of prolonged ischemic organ injury, and negatively affecting post-transplant outcomes. In this study, we trained and externally validated an ODE-RNN model, which combines recurrent neural network with neural ordinary equations and excels in processing irregularly-sampled time series data. The model is designed to predict time-to-death following terminal extubation in the intensive care unit (ICU) using the last 24 hours of clinical observations. Our model was trained on a cohort of 3,238 patients from Yale New Haven Hospital, and validated on an external cohort of 1,908 patients from six hospitals across Connecticut. The model achieved accuracies of 95.3 {+/-} 1.0% and 95.4 {+/-} 0.7% for predicting whether death would occur in the first 30 and 60 minutes, respectively, with a calibration error of 0.024 {+/-} 0.009. Heart rate, respiratory rate, mean arterial blood pressure (MAP), oxygen saturation (SpO2), and Glasgow Coma Scale (GCS) scores were identified as the most important predictors. Surpassing existing clinical scores, our model sets the stage for reduced organ acquisition costs and improved post-transplant outcomes.
ImmunoStruct: Integration of protein sequence, structure, and biochemical properties for immunogenicity prediction and interpretation
Kevin B. Givechian
João F. Rocha
Edward Yang
Chen Liu
Kerrie Greene
Rex Ying
Etienne Caron
Akiko Iwasaki
Epitope-based vaccines are promising therapeutic modalities for infectious diseases and cancer, but identifying immunogenic epitopes is chal… (voir plus)lenging. The vast majority of prediction methods are sequence-based, and do not incorporate wide-scale structure data and biochemical properties across each peptide-MHC (pMHC) complex. We present ImmunoStruct, a deep-learning model that integrates sequence, structural, and biochemical information to predict multi-allele class-I pMHC immunogenicity. By leveraging a multimodal dataset of ∼ 27,000 peptide-MHC complexes that we generated with AlphaFold, we demonstrate that ImmunoStruct improves immunogenicity prediction performance and interpretability beyond existing methods, across infectious disease epitopes and cancer neoepitopes. We further show strong alignment with in vitro assay results for a set of SARS-CoV-2 epitopes. This work also presents a new architecture that incorporates equivariant graph processing and multi-modal data integration for the long standing task in immunotherapy.
ImmunoStruct: Integration of protein sequence, structure, and biochemical properties for immunogenicity prediction and interpretation
Kevin Bijan Givechian
João Felipe Rocha
Edward Yang
Chen Liu
Kerrie Greene
Rex Ying
Etienne Caron
Akiko Iwasaki
ProtSCAPE: Mapping the landscape of protein conformations in molecular dynamics
Siddharth Viswanath
Dhananjay Bhaskar
David R. Johnson
João F. Rocha
Egbert Castro
Jackson Grady
Alex T. Grigas
Michael Perlmutter
Corey S. O'Hern
Understanding the dynamic nature of protein structures is essential for comprehending their biological functions. While significant progress… (voir plus) has been made in predicting static folded structures, modeling protein motions on microsecond to millisecond scales remains challenging. To address these challenges, we introduce a novel deep learning architecture, Protein Transformer with Scattering, Attention, and Positional Embedding (ProtSCAPE), which leverages the geometric scattering transform alongside transformer-based attention mechanisms to capture protein dynamics from molecular dynamics (MD) simulations. ProtSCAPE utilizes the multi-scale nature of the geometric scattering transform to extract features from protein structures conceptualized as graphs and integrates these features with dual attention structures that focus on residues and amino acid signals, generating latent representations of protein trajectories. Furthermore, ProtSCAPE incorporates a regression head to enforce temporally coherent latent representations.
Convergence of Manifold Filter-Combine Networks
David R. Johnson
Joyce Chew
Siddharth Viswanath
Edward De Brouwer
Deanna Needell
Michael Perlmutter
In order to better understand manifold neural networks (MNNs), we introduce Manifold Filter-Combine Networks (MFCNs). The filter-combine fra… (voir plus)mework parallels the popular aggregate-combine paradigm for graph neural networks (GNNs) and naturally suggests many interesting families of MNNs which can be interpreted as the manifold analog of various popular GNNs. We then propose a method for implementing MFCNs on high-dimensional point clouds that relies on approximating the manifold by a sparse graph. We prove that our method is consistent in the sense that it converges to a continuum limit as the number of data points tends to infinity.
Convergence of Manifold Filter-Combine Networks
David R. Johnson
Joyce A. Chew
Siddharth Viswanath
Edward De Brouwer
Deanna Needell
Michael Perlmutter
In order to better understand manifold neural networks (MNNs), we introduce Manifold Filter-Combine Networks (MFCNs). The filter-combine fra… (voir plus)mework parallels the popular aggregate-combine paradigm for graph neural networks (GNNs) and naturally suggests many interesting families of MNNs which can be interpreted as the manifold analog of various popular GNNs. We then propose a method for implementing MFCNs on high-dimensional point clouds that relies on approximating the manifold by a sparse graph. We prove that our method is consistent in the sense that it converges to a continuum limit as the number of data points tends to infinity.