Portrait of Amin Emad

Amin Emad

Associate Academic Member
McGill University, Department of Electrical and Computer Engineering
Research Topics
Computational Biology
Deep Learning
Drug Discovery
Epigenomics
Generative Models
Genomics
Medical Machine Learning
Microbiome
Molecular Modeling
Network Science
Out-of-Distribution (OOD) Generalization
Transcriptomics

Biography

Amin Emad is the director of COMBINE lab (Computational Biology and Artificial Intelligence). He is an Associate professor in the Department of Electrical and Computer Engineering at McGill University and an Associate Academic member of Mila – Quebec Artificial Intelligence Institute.

He is affiliated with McGill’s Rosalind and Morris Goodman Cancer Institute, the McGill initiative in Computational Medicine (MiCM), McGill’s Quantitative Life Sciences (QLS) program, and the Meakins-Christie Laboratories at the McGill University Hospital Centre.

Before joining McGill, Emad was a Postdoctoral Research Associate at the NIH-funded KnowEnG – A Center of Excellence in Big Data Computing, which is associated with the Department of Computer Science and the Institute for Genomic Biology at the University of Illinois at Urbana-Champaign (UIUC). He received his PhD from UIUC.

Current Students

Undergraduate - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
Postdoctorate - McGill University

Publications

GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks
Abdulrahman Takiddeen
We introduce GRouNdGAN, a gene regulatory network (GRN)-guided causal implicit generative model for simulating single-cell RNA-seq data, in-… (see more)silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on three experimental datasets, we show that our model captures non-linear TF-gene dependences and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. Despite imposing rigid causality constraints, it outperforms state-of-the-art simulators in generating realistic cells. GRouNdGAN learns meaningful causal regulatory dynamics, allowing sampling from both observational and interventional distributions. This enables it to synthesize cells under conditions that do not occur in the dataset at inference time, allowing to perform in-silico TF knockout experiments. Our results show that in-silico knockout of cell type-specific TFs significantly reduces cells of that type being generated. Interactions imposed through the GRN are emphasized in the simulated datasets, resulting in GRN inference algorithms assigning them much higher scores than interactions not imposed but of equal importance in the experimental training dataset. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest. Our results show that GRouNdGAN is a stable, realistic, and effective simulator with various applications in single-cell RNA-seq analysis.
Deciphering lineage-relevant gene regulatory networks during endoderm formation by InPheRNo-ChIP
William A. Pastor
Deciphering the underlying gene regulatory networks (GRNs) that govern early human embryogenesis is critical for understanding developmental… (see more) mechanisms yet remains challenging due to limited sample availability and the inherent complexity of the biological processes involved. To address this, we developed InPheRNo-ChIP, a computational framework that integrates multimodal data, including RNA-seq, transcription factor (TF)–specific ChIP-seq, and phenotypic labels, to reconstruct phenotype-relevant GRNs associated with endoderm development. The core of this method is a probabilistic graphical model that models the simultaneous effect of TFs on their putative target genes to influence a particular phenotypic outcome. Unlike the majority of existing GRN inference methods that are agnostic to the phenotypic outcomes, InPheRNo-ChIP directly incorporates phenotypic information during GRN inference, enabling the distinction between lineage-specific and general regulatory interactions. We integrated data from three experimental studies and applied InPheRNo-ChIP to infer the GRN governing the differentiation of human embryonic stem cells into definitive endoderm. Benchmarking against a scRNA-seq CRISPRi study demonstrated InPheRNo-ChIP’s ability to identify regulatory interactions involving endoderm markers FOXA2, SMAD2, and SOX17, outperforming other methods. This highlights the importance of incorporating the phenotypic context during network inference. Furthermore, an ablation study confirms the synergistic contribution of ChIP-seq, RNA-seq, and phenotypic data, highlighting the value of multimodal integration for accurate phenotype-relevant GRN reconstruction.
INTREPPPID—an orthologue-informed quintuplet network for cross-species prediction of protein-protein interaction
An overwhelming majority of protein–protein interaction (PPI) studies are conducted in a select few model organisms largely due to constra… (see more)ints in time and cost of the associated ‘wet lab’ experiments. In silico PPI inference methods are ideal tools to overcome these limitations, but often struggle with cross-species predictions. We present INTREPPPID, a method that incorporates orthology data using a new ‘quintuplet’ neural network, which is constructed with five parallel encoders with shared parameters. INTREPPPID incorporates both a PPI classification task and an orthologous locality task. The latter learns embeddings of orthologues that have small Euclidean distances between them and large distances between embeddings of all other proteins. INTREPPPID outperforms all other leading PPI inference methods tested on both the intraspecies and cross-species tasks using strict evaluation datasets. We show that INTREPPPID’s orthologous locality loss increases performance because of the biological relevance of the orthologue data and not due to some other specious aspect of the architecture. Finally, we introduce PPI.bio and PPI Origami, a web server interface for INTREPPPID and a software tool for creating strict evaluation datasets, respectively. Together, these two initiatives aim to make both the use and development of PPI inference tools more accessible to the community.
Validation of ANG-1 and P-SEL as biomarkers of post-COVID-19 conditions using data from the Biobanque québécoise de la COVID-19 (BQC-19)
Eric Yamga
Alain Piché
Madeleine Durand
Simon Rousseau
Analysis of gene expression and use of connectivity mapping to identify drugs for treatment of human glomerulopathies
Chen-Fang Chung
Joan Papillon
José R. Navarro-Betancourt
Julie Guillemette
Ameya Bhope
Andrey V. Cybulsky
A circulating proteome-informed prognostic model of COVID-19 disease activity that relies on 1 routinely available clinical laboratories 2
Karine Tremblay
Simon Rousseau
Abstract
Interpretable deep learning architectures for improving drug response prediction performance: myth or reality?
Motivation: Recent advances in deep learning model development have enabled more accurate prediction of drug response in cancer. However, th… (see more)e black-box nature of these models still remains a hurdle in their adoption for precision cancer medicine. Recent efforts have focused on making these models interpretable by incorporating signaling pathway information in model architecture. While these models improve interpretability, it is unclear whether this higher interpretability comes at the cost of less accurate predictions, or a prediction improvement can also be obtained. Results: In this study, we comprehensively and systematically assessed four state-of-the-art interpretable models developed for drug response prediction to answer this question using three pathway collections. Our results showed that models that explicitly incorporate pathway information in the form of a latent layer perform worse compared to models that incorporate this information implicitly. Moreover, in most evaluation setups the best performance is achieved using a simple black-box model. In addition, replacing the signaling pathways with randomly generated pathways shows a comparable performance for the majority of these interpretable models. Our results suggest that new interpretable models are necessary to improve the drug response prediction performance. In addition, the current study provides different baseline models and evaluation setups necessary for such new models to demonstrate their superior prediction performance. Availability and Implementation: Implementation of all methods are provided inhttps://github.com/Emad-COMBINE-lab/InterpretableAI_for_DRP. Generated uniform datasets are inhttps://zenodo.org/record/7101665#.YzS79HbMKUk. Contact:amin.emad@mcgill.caSupplementary Information: Online-only supplementary data is available at the journal’s website.
MARSY: a multitask deep-learning framework for prediction of drug combination synergy scores
Mohamed Reda El Khili
Combination therapies have emerged as a treatment strategy for cancers to reduce the probability of drug resistance and to improve outcomes.… (see more) Large databases curating the results of many drug screening studies on preclinical cancer cell lines have been developed, capturing the synergistic and antagonistic effects of combination of drugs in different cell lines. However, due to the high cost of drug screening experiments and the sheer size of possible drug combinations, these databases are quite sparse. This necessitates the development of transductive computational models to accurately impute these missing values. Here, we developed MARSY, a deep-learning multitask model that incorporates information on the gene expression profile of cancer cell lines, as well as the differential expression signature induced by each drug to predict drug-pair synergy scores. By utilizing two encoders to capture the interplay between the drug pairs, as well as the drug pairs and cell lines, and by adding auxiliary tasks in the predictor, MARSY learns latent embeddings that improve the prediction performance compared to state-of-the-art and traditional machine-learning models. Using MARSY, we then predicted the synergy scores of 133 722 new drug-pair cell line combinations, which we have made available to the community as part of this study. Moreover, we validated various insights obtained from these novel predictions using independent studies, confirming the ability of MARSY in making accurate novel predictions. An implementation of the algorithms in Python and cleaned input datasets are provided in https://github.com/Emad-COMBINE-lab/MARSY.
Preclinical-to-clinical Anti-cancer Drug Response Prediction and Biomarker Identification Using TINDL
Lixuan Wei
Liewei Wang
Junmei Cairns
Prediction of the response of cancer patients to different treatments and identification of biomarkers of drug response are two major goals … (see more)of individualized medicine. Here, we developed a deep learning framework called TINDL, completely trained on preclinical cancer cell lines (CCLs), to predict the response of cancer patients to different treatments. TINDL utilizes a tissue-informed normalization to account for the tissue type and cancer type of the tumors and to reduce the statistical discrepancies between CCLs and patient tumors. Moreover, by making the deep learning black box interpretable, this model identifies a small set of genes whose expression levels are predictive of drug response in the trained model, enabling identification of biomarkers of drug response. Using data from two large databases of CCLs and cancer tumors, we showed that this model can distinguish between sensitive and resistant tumors for 10 (out of 14) drugs, outperforming various other machine learning models. In addition, our small interfering RNA (siRNA) knockdown experiments on 10 genes identified by this model for one of the drugs (tamoxifen) confirmed that tamoxifen sensitivity is substantially influenced by all of these genes in MCF7 cells, and seven of these genes in T47D cells. Furthermore, genes implicated for multiple drugs pointed to shared mechanism of action among drugs and suggested several important signaling pathways. In summary, this study provides a powerful deep learning framework for prediction of drug response and identification of biomarkers of drug response in cancer. The code can be accessed at https://github.com/ddhostallero/tindl.
Poisson Group Testing: A Probabilistic Model for Boolean Compressed Sensing
Olgica Milenkovic
We introduce a novel probabilistic group testing framework, termed Poisson group testing, in which the number of defectives follows a right-… (see more)truncated Poisson distribution. The Poisson model has a number of new applications, including dynamic testing with diminishing relative rates of defectives. We consider both nonadaptive and semi-adaptive identification methods. For nonadaptive methods, we derive a lower bound on the number of tests required to identify the defectives with a probability of error that asymptotically converges to zero; in addition, we propose test matrix constructions for which the number of tests closely matches the lower bound. For semiadaptive methods, we describe a lower bound on the expected number of tests required to identify the defectives with zero error probability. In addition, we propose a stage-wise reconstruction algorithm for which the expected number of tests is only a constant factor away from the lower bound. The methods rely only on an estimate of the average number of defectives, rather than on the individual probabilities of subjects being defective.