Portrait of Amin Emad

Amin Emad

Associate Academic Member
McGill University, Department of Electrical and Computer Engineering
Research Topics
Computational Biology
Deep Learning
Drug Discovery
Epigenomics
Generative Models
Genomics
Medical Machine Learning
Microbiome
Molecular Modeling
Network Science
Out-of-Distribution (OOD) Generalization
Transcriptomics

Biography

Amin Emad is the director of COMBINE lab (Computational Biology and Artificial Intelligence). He is an Associate professor in the Department of Electrical and Computer Engineering at McGill University and an Associate Academic member of Mila – Quebec Artificial Intelligence Institute.

He is affiliated with McGill’s Rosalind and Morris Goodman Cancer Institute, the McGill initiative in Computational Medicine (MiCM), McGill’s Quantitative Life Sciences (QLS) program, and the Meakins-Christie Laboratories at the McGill University Hospital Centre.

Before joining McGill, Emad was a Postdoctoral Research Associate at the NIH-funded KnowEnG – A Center of Excellence in Big Data Computing, which is associated with the Department of Computer Science and the Institute for Genomic Biology at the University of Illinois at Urbana-Champaign (UIUC). He received his PhD from UIUC.

Current Students

Undergraduate - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
Postdoctorate - McGill University

Publications

Unbiased characterization of COVID-19 endotypes leads to prognostication of high-risk individuals using routine blood tests
Catherine Allard
Madeleine Durand
Karine Tremblay
Simon Rousseau
Unsupervised proteomic analysis identified biologically coherent endotypes that advance understanding of acute lung injury in COVID‑19 and… (see more) support improved diagnostic and prognostic strategies.
A Latent Space Thermodynamic Model of Cell Differentiation
Ali Poursina
Arsham Mikaeili Namini
Alihossein Saberi
Hamed S. Najafabadi
Abstract Inferring the governing dynamics of differentiation that capture cell state evolution remains a central challenge in single-cell bi… (see more)ology. We present Latent Space Dynamics (LSD), a thermodynamics-inspired framework that models cell differentiation as evolution on a learned Waddington landscape in latent space. LSD jointly infers a low-dimensional cell state, a differentiable potential function governing developmental flow, and a local entropy term that quantifies cellular plasticity. Using a neural ordinary differential equation, LSD reconstructs continuous differentiation trajectories from time-ordered single-cell data. Across diverse developmental systems, LSD accurately recovers lineage hierarchies, predicts fate commitment for unseen cell types, and outperforms existing trajectory inference approaches in directional accuracy. Moreover, in silico gene perturbations reveal how individual regulators reshape the landscape, and entropy provides a quantitative measure of plasticity in development and cancer.
CellPace: A temporal diffusion-forcing framework for simulation, interpolation and forecasting of single-cell dynamics
Abstract Single-cell omics technologies resolve cellular heterogeneity at high resolution but provide only static snapshots of continuous de… (see more)velopmental processes. This makes it difficult to recover coherent temporal dynamics when developmental stages are irregularly sampled or missing. While recent generative models can simulate observed cell states, they often treat timepoints as discrete categories, hindering interpolation across gaps and extrapolation to unobserved future stages. We present CellPace, a generative model that learns and generates developmental dynamics by leveraging a transformer-based temporal diffusion backbone conditioned on continuous, gap-aware temporal encodings. Across diverse mouse developmental lineages, CellPace achieves state-of-the-art performance in simulation, interpolation, and forecasting tasks. Beyond recovering global population statistics, generated cells preserve fine-grained biological structure, retaining dynamic gene regulatory programs and mapping accurately to anatomical regions in spatial transcriptomics data. Furthermore, CellPace extends naturally to multi-modal data, modeling joint RNA-chromatin dynamics even when temporal ordering is inferred from pseudotime. Together, these results position CellPace as a robust framework for modeling and generating continuous developmental dynamics from sparse, cross-sectional single-cell data.
A flaw in using pretrained protein language models in protein–protein interaction inference models
Causal single-cell RNA-seq simulation, in silico perturbation, and GRN inference benchmarking using GRouNdGAN-Toolkit
Multi-Modal Protein Representation Learning with CLASP
A flaw in using pre-trained pLLMs in protein-protein interaction inference models
With the growing pervasiveness of pre-trained protein large language models (pLLMs), pLLM-based methods are increasingly being put forward f… (see more)or the protein-protein interaction (PPI) inference task. Here, we identify and confirm that existing pre-trained pLLMs are a source of data leakage for the downstream PPI task. We characterize the extent of the data leakage problem by training and comparing small and efficient pLLMs on a dataset that controls for data leakage (“strict”) with one that does not (“non-strict”). While data leakage from pre-trained pLLMs cause measurable inflation of testing scores, we find that this does not necessarily extend to other, non-paired biological tasks such as protein keyword annotation. Further, we find no connection between the context-lengths of pLLMs and the performance of pLLM-based PPI inference methods on proteins with sequence lengths that surpass it. Furthermore, we show that pLLM-based and non-pLLM-based models fail to generalize in tasks such as prediction of the human-SARS-CoV-2 PPIs or the effect of point mutations on binding-affinities. This study demonstrates the importance of extending existing protocols for the evaluation of pLLM-based models applied to paired biological datasets and identifies areas of weakness of current pLLM models.
Refining sequence-to-expression modelling with chromatin accessibility
Gregory Fonseca
Divergent responses to SARS-CoV-2 infection in bronchial epithelium with pre-existing respiratory diseases
Justine Oliva
Manon Ruffin
Claire Calmel
Aurélien Gibeaud
Andrés Pizzorno
Clémence Gaudin
Solenne Chardonnet
Viviane de Almeida Bastos
Manuel Rosa-Calatrava
Simon Rousseau
Harriet Corvol
Olivier Terrier
Loïc Guillot
Anticancer Monotherapy and Polytherapy Drug Response Prediction Using Deep Learning: Guidelines and Best Practices
Circulating IL-17F, but not IL-17A, is elevated in severe COVID-19 and leads to an ERK1/2 and p38 MAPK-dependent increase in ICAM-1 cell surface expression and neutrophil adhesion on endothelial cells
Jérôme Bédard-Matteau
Katelyn Yixiu Liu
Lyvia Fourcade
Douglas D. Fraser
Simon Rousseau
Severe COVID-19 is associated with neutrophilic inflammation and immunothrombosis. Several members of the IL-17 cytokine family have been as… (see more)sociated with neutrophilic inflammation and activation of the endothelium. Therefore, we investigated whether these cytokines were associated with COVID-19. We investigated the association between COVID-19 and circulating plasma levels of IL-17 cytokine family members in participants to the Biobanque québécoise de la COVID-19 (BQC19), a prospective observational cohort and an independent cohort from Western University (London, Ontario). We measured the in vitro impact of IL-17F on intercellular adhesion molecule 1 (ICAM-1) cell surface expression and neutrophil adhesion on endothelial cells in culture. The contribution of two Mitogen Activated Protein Kinase (MAPK) pathways was determined using small molecule inhibitors PD184352 (a MKK1/MKK2 inhibitor) and BIRB0796 (a p38 MAPK inhibitor). We found increased IL-17D and IL-17F plasma levels when comparing SARS-CoV-2-positive vs negative hospitalized participants. Moreover, increased plasma levels of IL-17D, IL-17E and IL-17F were noted when comparing severe versus mild COVID-19. IL-17F, but not IL-17A, was significantly elevated in people with COVID-19 compared to healthy controls and with more severe disease. In vitro work on endothelial cells treated with IL-17F for 24h showed an increase cell surface expression of ICAM-1 accompanied by neutrophil adhesion. The introduction of two MAPK inhibitors significantly reduced the binding of neutrophils while also reducing ICAM-1 expression at the surface level of endothelial cells, but not its intracellular expression. Overall, these results have identified an association between two cytokines of the IL-17 family (IL-17D and IL-17F) with COVID-19 and disease severity. Considering that IL-17F stimulation promotes neutrophil adhesion to the endothelium in a MAPK-dependent manner, it is attractive to speculate that this pathway may contribute to pathogenic immunothrombosis in concert with other molecular effectors.
A long-context RNA foundation model for predicting transcriptome architecture
Benedict Choi
Simai Wang
Aldo Hernández-Corchado
Mohsen Naghipourfar
Arsham Mikaeili Namini
Vijay Ramani
Hamed S. Najafabadi
Hani Goodarzi
Linking DNA sequence to genomic function remains one of the grand challenges in genetics and genomics. Here, we combine large-scale single-m… (see more)olecule transcriptome sequencing of diverse cancer cell lines with cutting-edge machine learning to build LoRNASH, an RNA foundation model that learns how the nucleotide sequence of unspliced pre-mRNA dictates transcriptome architecture—the relative abundances and molecular structures of mRNA isoforms. Owing to its use of the StripedHyena architecture, LoRNASH handles extremely long sequence inputs at base-pair resolution (∼65 kilobase pairs), allowing for quantitative, zero-shot prediction of all aspects of transcriptome architecture, including isoform abundance, isoform structure, and the impact of DNA sequence variants on transcript structure and abundance. We anticipate that our public data release and the accompanying frontier model will accelerate many aspects of RNA biotechnology. More broadly, we envision the use of LoRNASH as a foundation for fine-tuning of any transcriptome-related downstream prediction task, including cell-type specific gene expression, splicing, and general RNA processing.