Portrait of Amin Emad

Amin Emad

Associate Academic Member
Assistant Professor, McGill University, Department of Electrical and Computer Engineering


Amin Emad is an assistant professor in the Department of Electrical and Computer Engineering at McGill University and an associate academic member of Mila – Quebec Artificial Intelligence Institute.

He is affiliated with McGill’s Rosalind and Morris Goodman Cancer Institute, the McGill initiative in Computational Medicine (MiCM), McGill’s Quantitative Life Sciences (QLS) program, and the Meakins-Christie Laboratories at the McGill University Hospital Centre.

Before joining McGill, Emad was a postdoctoral research associate at the NIH-funded KnowEnG – A Center of Excellence in Big Data Computing, which is associated with the Department of Computer Science and the Institute for Genomic Biology at the University of Illinois at Urbana-Champaign (UIUC). He received his PhD from UIUC in 2015, his MSc from the University of Alberta in 2009, and his BSc from Sharif University of Technology (Tehran) in 2007. Emad’s research lies at the intersection of AI and computational biology.

Current Students


INTREPPPID - An Orthologue-Informed Quintuplet Network for Cross-Species Prediction of Protein-Protein Interaction
Joseph Szymborski
An overwhelming majority of protein-protein interaction (PPI) studies are conducted in a select few model organisms largely due to constrain… (see more)ts in time and cost of the associated “wet lab” experiments. In silico PPI inference methods are ideal tools to overcome these limitations, but often struggle with cross-species predictions. We present INTREPPPID, a method which incorporates orthology data using a new “quintuplet” neural network, which is constructed with five parallel encoders with shared parameters. INTREPPPID incorporates both a PPI classification task and an orthologous locality task. The latter learns embeddings of orthologues that have small Euclidean distances between them and large distances between embeddings of all other proteins. INTREPPPID outperforms all other leading PPI inference methods tested on both the intra-species and cross-species tasks using strict evaluation datasets. We show that INTREPPPID’s orthologous locality loss increases performance because of the biological relevance of the orthologue data, and not due to some other specious aspect of the architecture. Finally, we introduce PPI.bio and PPI Origami, a web server interface for INTREPPPID and a software tool for creating strict evaluation datasets, respectively. Together, these two initiatives aim to make both the use and development of PPI inference tools more accessible to the community. GRAPHICAL ABSTRACT
Validation of ANG-1 and P-SEL as biomarkers of post-COVID-19 conditions using data from the Biobanque québécoise de la COVID-19 (BQC-19)
Eric Yamga
Antoine Soulé
Alain Piché
Madeleine Durand
Simon Rousseau
GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks
Yazdan Zinati
Abdulrahman Takiddeen
We introduce GRouNdGAN, a gene regulatory network (GRN)-guided causal implicit generative model for simulating single-cell RNA-seq data, in-… (see more)silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on three experimental datasets, we show that our model captures non-linear TF-gene dependences and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. Despite imposing rigid causality constraints, it outperforms state-of-the-art simulators in generating realistic cells. GRouNdGAN learns meaningful causal regulatory dynamics, allowing sampling from both observational and interventional distributions. This enables it to synthesize cells under conditions that do not occur in the dataset at inference time, allowing to perform in-silico TF knockout experiments. Our results show that in-silico knockout of cell type-specific TFs significantly reduces cells of that type being generated. Interactions imposed through the GRN are emphasized in the simulated datasets, resulting in GRN inference algorithms assigning them much higher scores than interactions not imposed but of equal importance in the experimental training dataset. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest. Our results show that GRouNdGAN is a stable, realistic, and effective simulator with various applications in single-cell RNA-seq analysis.
Interpretable deep learning architectures for improving drug response prediction performance: myth or reality?
Yihui Li
David Earl Hostallero
Motivation: Recent advances in deep learning model development have enabled more accurate prediction of drug response in cancer. However, th… (see more)e black-box nature of these models still remains a hurdle in their adoption for precision cancer medicine. Recent efforts have focused on making these models interpretable by incorporating signaling pathway information in model architecture. While these models improve interpretability, it is unclear whether this higher interpretability comes at the cost of less accurate predictions, or a prediction improvement can also be obtained. Results: In this study, we comprehensively and systematically assessed four state-of-the-art interpretable models developed for drug response prediction to answer this question using three pathway collections. Our results showed that models that explicitly incorporate pathway information in the form of a latent layer perform worse compared to models that incorporate this information implicitly. Moreover, in most evaluation setups the best performance is achieved using a simple black-box model. In addition, replacing the signaling pathways with randomly generated pathways shows a comparable performance for the majority of these interpretable models. Our results suggest that new interpretable models are necessary to improve the drug response prediction performance. In addition, the current study provides different baseline models and evaluation setups necessary for such new models to demonstrate their superior prediction performance. Availability and Implementation: Implementation of all methods are provided in https://github.com/Emad-COMBINE-lab/InterpretableAI_for_DRP. Generated uniform datasets are in https://zenodo.org/record/7101665#.YzS79HbMKUk. Contact: amin.emad@mcgill.ca Supplementary Information: Online-only supplementary data is available at the journal’s website.
MARSY: a multitask deep-learning framework for prediction of drug combination synergy scores
Mohamed Reda El Khili
Safyan Aman Memon
Motivation Combination therapies have emerged as a treatment strategy for cancers to reduce the probability of drug resistance and to improv… (see more)e outcome. Large databases curating the results of many drug screening studies on preclinical cancer cell lines have been developed, capturing the synergistic and antagonistic effects of combination of drugs in different cell lines. However, due to the high cost of drug screening experiments and the sheer size of possible drug combinations, these databases are quite sparse. This necessitates the development of transductive computational models to accurately impute these missing values. Results Here, we developed MARSY, a deep learning multi-task model that incorporates information on gene expression profile of cancer cell lines, as well as the differential expression signature induced by each drug to predict drug-pair synergy scores. By utilizing two encoders to capture the interplay between the drug-pairs, as well as the drug-pairs and cell lines, and by adding auxiliary tasks in the predictor, MARSY learns latent embeddings that improve the prediction performance compared to state-of-the-art and traditional machine learning models. Using MARSY, we then predicted the synergy scores of 133,722 new drug-pair cell line combinations, which we have made available to the community as part of this study. Moreover, we validated various insights obtained from these novel predictions using independent studies, confirming the ability of MARSY in making accurate novel predictions. Availability and Implementation An implementation of the algorithms in Python and cleaned input datasets are provided in https://github.com/Emad-COMBINE-lab/MARSY. Contact amin.emad@mcgill.ca Supplementary Information Online-only supplementary data is available at the journal’s website.
Analysis of gene expression and use of connectivity mapping to identify drugs for treatment of human glomerulopathies
Chen-Fang Chung
Joan Papillon
José R. Navarro-Betancourt
Julie Guillemette
Ameya Bhope
Andrey V. Cybulsky
Preclinical-to-clinical Anti-cancer Drug Response Prediction and Biomarker Identification Using TINDL
David Earl Hostallero
Lixuan Wei
Liewei Wang
Junmei Cairns
A circulating proteome-informed prognostic model of COVID-19 disease activity that relies on 1 routinely available clinical laboratories 2
William Ma
Antoine Soulé
Karine Tremblay
Simon Rousseau
Poisson Group Testing: A Probabilistic Model for Boolean Compressed Sensing
Olgica Milenkovic
We introduce a novel probabilistic group testing framework, termed Poisson group testing, in which the number of defectives follows a right-… (see more)truncated Poisson distribution. The Poisson model has a number of new applications, including dynamic testing with diminishing relative rates of defectives. We consider both nonadaptive and semi-adaptive identification methods. For nonadaptive methods, we derive a lower bound on the number of tests required to identify the defectives with a probability of error that asymptotically converges to zero; in addition, we propose test matrix constructions for which the number of tests closely matches the lower bound. For semiadaptive methods, we describe a lower bound on the expected number of tests required to identify the defectives with zero error probability. In addition, we propose a stage-wise reconstruction algorithm for which the expected number of tests is only a constant factor away from the lower bound. The methods rely only on an estimate of the average number of defectives, rather than on the individual probabilities of subjects being defective.