Portrait of Jian Tang

Jian Tang

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, HEC Montréal, Department of Decision Sciences
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Founder, BioGeometry
Research Topics
AI for Science
Computational Biology
Generative Models
Graph Neural Networks
Large Language Models (LLM)
Molecular Modeling

Biography

Jian Tang is an Associate professor at HEC's Department of Decision Sciences. He is also an Adjunct professor at the Department of Computer Science and Operations Research at University of Montreal and a Core Academic member at Mila - Quebec AI Institute. He is a Canada CIFAR AI Chair and the Founder of BioGeometry, an AI startup that focuses on generative AI for antibody discovery. Tang’s main research interests are deep generative models and graph machine learning, and their applications to drug discovery. He is an international leader in graph machine learning, and LINE, his node representation method, has been widely recognized and cited more than five thousand times. He has also done pioneering work on AI for drug discovery, such as developing the first open-source machine learning frameworks for drug discovery, TorchDrug and TorchProtein.

Current Students

PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal

Publications

Engineered Nonheme Iron Enzymes Enable Asymmetric Hydrogenation of Alkenes
Yunfei He
Shuang-Yu Dai
Mei‐Yan Xu
Baixu Ma
Lizhi Tao
Developing biocatalytic systems capable of reducing simple alkenes is highly desirable for synthetic chemistry and biosynthesis, yet existin… (see more)g enzymes remain largely restricted to their ability to convert polarized, electron-deficient substrates. Here, we present a nonheme iron metalloenzyme platform that enables hydrogenation of styrenes, conjugated nitriles and amides, and nonconjugated olefins through a putative iron–hydride mechanism. Starting from the Fe(II)/ α -ketoglutarate-dependent dioxygenase GOX, iterative rounds of directed evolution produced an engineered “alkene hydrogenase” (AHase-6) containing 16 mutations and promoting NaBH 4 -driven reduction across diverse C═C bond motifs. Kinetic analysis indicates that this enzymatic hydrogenation process proceeds via formation of an enzyme–substrate ternary complex through a sequential mechanism. Mechanistic studies further reveal that alkene insertion occurs with regioselectivity governed primarily by substrate electronics and sterics. These findings establish nonheme iron enzymes as an unrecognized scaffold for metal–hydride-based hydrogenation and highlight their potential as sustainable, tunable alternatives to traditional catalytic systems.
Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning
Yashi Zhang
Hongyu Guo
Perturbation experiments are central to understanding cellular mechanisms, but remain costly and sparse, motivating prediction of gene expre… (see more)ssion responses for unobserved conditions. A promising recent direction leverages large language models (LLMs) as"virtual cell"simulators-using stepwise, knowledge-grounded mechanistic reasoning to infer differential expression-pointing toward an interpretable, knowledge-driven paradigm that transcends purely data-driven approaches. However, we find that plausibility is not prediction: despite producing biologically plausible explanations, these methods fail to capture perturbation-specific effects: systematically overestimating differential expression, often underperforming a simple gene-frequency baseline in aggregate evaluations, and collapsing to chance-level performance at the per-gene level. This reveals a reliance on intrinsic gene response tendencies rather than true perturbation reasoning. We trace this failure to how evidence is presented: existing methods evaluate perturbation-gene pairs in isolation, without exposing how related perturbations differ in their effects on the same gene. To address this limitation, we introduce CORE (Contrastive Organization of Relational Evidence), which reframes prediction as a comparison task by organizing evidence into positive and negative outcomes from related perturbations. Using a biomedical knowledge graph for evidence retrieval, CORE improves calibration and substantially boosts perturbation-specific prediction in both LLM-based and non-LLM settings: for example, on drug-perturbation data, CORE-Reasoning improves Qwen3.5-9B aggregate metrics by up to 28.6%, while on generic perturbation data, CORE-Voting raises macro-per-gene AUROC from chance to 0.703 in average across four cell lines. This highlights contrastive evidence organization as essential to reliable LLM-based perturbation reasoning
Learning Structure, Energy, and Dynamics: A Survey of Artificial Intelligence for Protein Dynamics
Haocheng Tang
Liang Shi
Protein dynamics underlie many biological functions, yet remain difficult to characterize due to the high computational cost of molecular dy… (see more)namics simulations and the scarcity of dynamic structural data. This survey reviews recent advances in artificial intelligence for protein dynamics from three perspectives: learning from structural ensembles and trajectories, learning from physical energy signals, and learning to accelerate molecular simulations. We summarize representative methods for conformation ensemble generation, trajectory generation, Boltzmann generators, physics-aware adaptation, machine learning potentials, coarse-grained modeling, and collective variable discovery. We further discuss available datasets and key open challenges, such as scalability, thermodynamic consistency, kinetic fidelity, and integration with experimental constraints.
RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception
Jiahao Ma
Qiang Zhang
Peiran Liu
Zeran Su
Pihai Sun
Gang Han
Wen Zhao
Wei Cui
Zhang Zhang
Zhiyuan Xu
Renjing Xu
Miaomiao Liu
Yijie Guo
Surround-view perception is increasingly important for robotic navigation and loco-manipulation, especially in human-in-the-loop settings su… (see more)ch as teleoperation, data collection, and emergency takeover. However, current robotic visual interfaces are often limited to narrow forward-facing views, or, when multiple on-board cameras are available, require cumbersome manual switching that interrupts the operator's workflow. Both configurations suffer from motion-induced jitter that causes simulator sickness in head-mounted displays. We introduce a surround-view robotic vision system that combines six cameras with LiDAR to provide full 360
Antibody discovery technology: innovation and outlook from classic to leading edge
PengFei WANG
,
Atomic Trajectory Modeling with State Space Models for Biomolecular Dynamics
Liang Shi
Junqi Liu
Zhi Yang
Understanding the dynamic behavior of biomolecules is fundamental to elucidating biological function and facilitating drug discovery. While … (see more)Molecular Dynamics (MD) simulations provide a rigorous physical basis for studying these dynamics, they remain computationally expensive for long timescales. Conversely, recent deep generative models accelerate conformation generation but are typically either failing to model temporal relationship or built only for monomeric proteins. To bridge this gap, we introduce ATMOS, a novel generative framework based on State Space Models (SSM) designed to generate atom-level MD trajectories for biomolecular systems. ATMOS integrates a Pairformer-based state transition mechanism to capture long-range temporal dependencies, with a diffusion-based module to decode trajectory frames in an autoregressive manner. ATMOS is trained across crystal structures from PDB and conformation trajectory from large-scale MD simulation datasets including mdCATH and MISATO. We demonstrate that ATMOS achieves state-of-the-art performance in generating conformation trajectories for both protein monomers and complex protein-ligand systems. By enabling efficient inference of atomic trajectory of motions, this work establishes a promising foundation for modeling biomolecular dynamics.
PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling
Yashi Zhang
Hongyu Guo
Building Virtual Cells that can accurately simulate cellular responses to perturbations is a long-standing goal in systems biology. A fundam… (see more)ental challenge is that high-throughput single-cell sequencing is destructive: the same cell cannot be observed both before and after a perturbation. Thus, perturbation prediction requires mapping unpaired control and perturbed populations. Existing models address this by learning maps between distributions, but typically assume a single fixed response distribution when conditioned on observed cellular context (e.g., cell type) and the perturbation type. In reality, responses vary systematically due to unobservable latent factors such as microenvironmental fluctuations and complex batch effects, forming a manifold of possible distributions for the same observed conditions. To account for this variability, we introduce PerturbDiff, which shifts modeling from individual cells to entire distributions. By embedding distributions as points in a Hilbert space, we define a diffusion-based generative process operating directly over probability distributions. This allows PerturbDiff to capture population-level response shifts across hidden factors. Benchmarks on established datasets show that PerturbDiff achieves state-of-the-art performance in single-cell response prediction and generalizes substantially better to unseen perturbations. See our project page (https://katarinayuan.github.io/PerturbDiff-ProjectPage/), where code and data will be made publicly available (https://github.com/DeepGraphLearning/PerturbDiff).
GeneZip: Region-Aware Compression for Long Context DNA Modeling
Genomic sequences span billions of base pairs (bp), posing a fundamental challenge for genome-scale foundation models. Existing approaches l… (see more)argely sidestep this barrier by either scaling relatively small models to long contexts or relying on heavy multi-GPU parallelism. Here we introduce GeneZip, a DNA compression model that leverages a key biological prior: genomic information is highly imbalanced. Coding regions comprise only a small fraction (about 2 percent) yet are information-dense, whereas most non-coding sequence is comparatively information-sparse. GeneZip couples HNet-style dynamic routing with a region-aware compression-ratio objective, enabling adaptive allocation of representation budget across genomic regions. As a result, GeneZip learns region-aware compression and achieves 137.6x compression with only 0.31 perplexity increase. On downstream long-context benchmarks, GeneZip achieves comparable or better performance on contact map prediction, expression quantitative trait loci prediction, and enhancer-target gene prediction. By reducing effective sequence length, GeneZip unlocks simultaneous scaling of context and capacity: compared to the prior state-of-the-art model JanusDNA, it enables training models 82.6x larger at 1M-bp context, supporting a 636M-parameter GeneZip model at 1M-bp context. All experiments in this paper can be trained on a single A100 80GB GPU.
GENERator: A Long-Context Generative Genomic Foundation Model
Q. Li
Wei Wu
Yong Zhang
Rui Chen
Mingyang Li
Kun Fu
Junyan Qi
Yongzhou Bao
Chao Wang
Yiheng Zhu
Zhiyun Zhang
Fuli Feng
Jieping Ye
Liu Yuwen
Hui Xiong
Zheng Wang
Zhang, Yuanyuan
Chen, Ruipu … (see 2 more)
Wang, Chao
Tang, Jian
Enhancing link prediction in biomedical knowledge graphs with BioPathNet
Emy Yue Hu
Svitlana Oleshko
Samuele Firmani
Hui Cheng
Maria Ulmer
Matthias Arnold
Maria Colomé-Tatché
Annalisa Marsico
Enhancing link prediction in biomedical knowledge graphs with BioPathNet
Emy Yue Hu
Svitlana Oleshko
Samuele Firmani
Hui Cheng
Maria Ulmer
Matthias Arnold
Maria Colomé-Tatché
Annalisa Marsico
Understanding complex interactions in biomedical networks is crucial for advancements in biomedicine, but traditional link prediction (LP) m… (see more)ethods are limited in capturing this complexity. We present BioPathNet, a graph neural network framework based on the neural Bellman–Ford network (NBFNet), addressing limitations of traditional representation-based learning methods through path-based reasoning for LP in biomedical knowledge graphs. Unlike node-embedding frameworks, BioPathNet learns representations between node pairs by considering all relations along paths, enhancing prediction accuracy and interpretability, and allowing visualization of influential paths and biological validation. BioPathNet leverages a background regulatory graph for enhanced message passing and uses stringent negative sampling to improve precision and scalability. BioPathNet outperforms or matches existing methods across diverse tasks including gene function annotation, drug–disease indication, synthetic lethality and lncRNA–target interaction prediction. Our study identifies promising additional drug indications for diseases such as acute lymphoblastic leukaemia and Alzheimer’s disease, validated by medical experts and clinical trials. In addition, we prioritize putative synthetic lethal gene pairs and regulatory lncRNA–target interactions. BioPathNet’s interpretability will enable researchers to trace prediction paths and gain molecular insights.
Fast Proteome-Scale Protein Interaction Retrieval via Residue-Level Factorization
Narendra Chaudhary
Qian Cong
Jian Zhou
Sanchit Misra
Protein-protein interactions (PPIs) are mediated at the residue level. Most sequence-based PPI models consider residue-residue interactions … (see more)across two proteins, which can yield accurate interaction scores but are too slow to scale. At proteome scale, identifying candidate PPIs requires evaluating nearly *all possible protein pairs*. For