Jian Tang

Biographie

Jian Tang est professeur agrégé au département de sciences de la décision de HEC. Il est aussi professeur associé au département informatique et recherche opérationnelle (DIRO) de l'Université de Montréal et un membre académique principal à Mila – Institut québécois d’intelligence artificielle. Il est titulaire d'une chaire de recherche en IA Canada-CIFAR et le fondateur de BioGeometry, une entreprise en démarrage spécialisée dans l'IA générative pour la découverte d'anticorps. Ses principaux domaines de recherche sont les modèles génératifs profonds, l'apprentissage automatique des graphes et leurs applications à la découverte de médicaments. Il est un leader international dans le domaine de l'apprentissage automatique des graphes, et son travail représentatif sur l'apprentissage de la représentation des nœuds, LINE, a été largement reconnu et cité plus de 5 000 fois. Il a également réalisé de nombreux travaux pionniers sur l'IA pour la découverte de médicaments, notamment le premier cadre d'apprentissage automatique à source ouverte pour la découverte de médicaments, TorchDrug et TorchProtein.

Étudiants actuels

Huiyu Cai

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Xixian Liu

Doctorat - Université de Montréal

Site web

Jiarui Lu

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Gauthier Gidel

Xinyu Yuan

Doctorat - UdeM

Github

Zhihao Zhan

Doctorat - UdeM

Doctorat - UdeM

Doctorat - HEC

Jianan Zhao

Doctorat - UdeM

Site web

Github

Publications

GENERator: A Long-Context Generative Genomic Foundation Model

Q. Li

Wei Wu

Yong Zhang

Zhihao Zhan

Rui Chen

Mingyang Li

Kun Fu

Junyan Qi

Yongzhou Bao

Chao Wang

Yiheng Zhu

Zhiyun Zhang

Fuli Feng

Jieping Ye

Liu Yuwen

Hui Xiong

Zheng Wang

Zhang, Yuanyuan

Chen, Ruipu … (voir 2 de plus)

Wang, Chao

Tang, Jian

2026-02-03

Research Square (accepté)

arxiv.org

Enhancing link prediction in biomedical knowledge graphs with BioPathNet

Emy Yue Hu

Svitlana Oleshko

Samuele Firmani

Hui Cheng

Zhaocheng Zhu

Maria Ulmer

Matthias Arnold

Maria Colomé-Tatché

Sophie Xhonneux

Annalisa Marsico

2026-01-19

Nature Biomedical Engineering (publié)

Enhancing link prediction in biomedical knowledge graphs with BioPathNet

Emy Yue Hu

Svitlana Oleshko

Samuele Firmani

Hui Cheng

Zhaocheng Zhu

Maria Ulmer

Matthias Arnold

Maria Colomé-Tatché

Sophie Xhonneux

Annalisa Marsico

Understanding complex interactions in biomedical networks is crucial for advancements in biomedicine, but traditional link prediction (LP) m… (voir plus)ethods are limited in capturing this complexity. We present BioPathNet, a graph neural network framework based on the neural Bellman–Ford network (NBFNet), addressing limitations of traditional representation-based learning methods through path-based reasoning for LP in biomedical knowledge graphs. Unlike node-embedding frameworks, BioPathNet learns representations between node pairs by considering all relations along paths, enhancing prediction accuracy and interpretability, and allowing visualization of influential paths and biological validation. BioPathNet leverages a background regulatory graph for enhanced message passing and uses stringent negative sampling to improve precision and scalability. BioPathNet outperforms or matches existing methods across diverse tasks including gene function annotation, drug–disease indication, synthetic lethality and lncRNA–target interaction prediction. Our study identifies promising additional drug indications for diseases such as acute lymphoblastic leukaemia and Alzheimer’s disease, validated by medical experts and clinical trials. In addition, we prioritize putative synthetic lethal gene pairs and regulatory lncRNA–target interactions. BioPathNet’s interpretability will enable researchers to trace prediction paths and gain molecular insights.

2026-01-19

Nature Biomedical Engineering (publié)

Fast Proteome-Scale Protein Interaction Retrieval via Residue-Level Factorization

Narendra Chaudhary

Qian Cong

Jian Zhou

Sanchit Misra

Protein-protein interactions (PPIs) are mediated at the residue level. Most sequence-based PPI models consider residue-residue interactions … (voir plus)across two proteins, which can yield accurate interaction scores but are too slow to scale. At proteome scale, identifying candidate PPIs requires evaluating nearly *all possible protein pairs*. For

2025-12-31

International Conference on Learning Representations (Accept (Poster))

Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment

Junqi Liu

Xiaoyang Hou

Chence Shi

Xin Liu

Zhi Yang

Protein sequence design must balance designability, defined as the ability to recover a target backbone, with multiple, often competing, dev… (voir plus)elopability properties such as solubility, thermostability, and expression. Existing approaches address these properties through post hoc mutation, inference-time biasing, or retraining on property-specific subsets, yet they are target dependent and demand substantial domain expertise or careful hyperparameter tuning. In this paper, we introduce ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity. ProtAlign employs a semi-online Direct Preference Optimization strategy with a flexible preference margin to mitigate conflicts among competing objectives and constructs preference pairs using in silico property predictors. Applied to the widely used ProteinMPNN backbone, the resulting model MoMPNN enhances developability without compromising designability across tasks including sequence design for CATH 4.3 crystal structures, de novo generated backbones, and real-world binder design scenarios, making it an appealing framework for practical protein sequence design.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

Kieran Didi

Zuobai Zhang

Guoqing Zhou

Danny Reidenbach

Zhonglin Cao

Sooyoung Cha

Tomas Geffner

Christian Dallago

Michael Bronstein

Martin Steinegger

Emine Kucukbenli

Arash Vahdat

Karsten Kreis

Protein interaction modeling is central to protein design, which has been transformed by machine learning with broad applications in drug di… (voir plus)scovery and beyond. In this landscape, structure-based de novo binder design is most often cast as either conditional generative modeling or sequence optimization via structure predictors ("hallucination"). We argue that this is a false dichotomy and propose Complexa, a novel fully atomistic binder generation method unifying both paradigms. We extend recent flow-based latent protein generation architecture and leverage the domain-domain interactions of monomeric computationally predicted protein structures to construct Teddymer, a new large-scale dataset of synthetic binder-target pairs for pretraining. Combined with high-quality experimental multimers, this enables training a strong base model. We then perform inference-time optimization with this generative prior, unifying the strengths of previously distinct generative and hallucination methods. Complexa sets a new state of the art in computational binder design benchmarks: it delivers markedly higher in-silico success rates than existing generative approaches, and our novel test-time optimization strategies greatly outperform previous hallucination methods under normalized compute budgets. We further demonstrate explicit interface hydrogen bond optimization, fold class-guided binder generation, and extensions to small molecule targets and enzyme design tasks, again surpassing prior methods. Code, models and new data will be publicly released.

2025-12-31

International Conference on Learning Representations (Accept (Oral))

Towards All-Atom Foundation Models for Biomolecular Binding Affinity Prediction

Liang Shi

Zuobai Zhang

Huiyu Cai

Santiago Miret

Zhi Yang

Biomolecular interactions play a critical role in biological processes. While recent breakthroughs like AlphaFold 3 have enabled accurate mo… (voir plus)deling of biomolecular complex structures, predicting binding affinity remains challenging mainly due to limited high-quality data. Recent methods are often specialized for specific types of biomolecular interactions, limiting their generalizability. In this work, we repurpose AlphaFold 3 for representation learning to predict binding affinity, a non-trivial task that requires shifting from generative structure prediction to encoding observed geometry, simplifying the heavily conditioned trunk module, and designing a framework to jointly capture sequence and structural information. To address these challenges, we introduce the **Atom-level Diffusion Transformer (ADiT)**, which takes sequence and structure as inputs, employs a unified tokenization scheme, integrates diffusion transformers, and removes dependencies on multiple sequence alignments and templates. We pre-train three ADiT variants on the PDB dataset with a denoising objective and evaluate them across protein-ligand, drug-target, protein-protein, and antibody-antigen interactions. The model achieves state-of-the-art or competitive performance across benchmarks, scales effectively with model size, and successfully identifies wet-lab validated affinity-enhancing antibody mutations, establishing a generalizable framework for biomolecular interactions. We plan to release the code upon acceptance.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

Efficient, Non‐Destructive Transfer of Wafer‐Scale Monolayer MoS
<sub>2</sub>
by Interface Engineering

Zheng Wei

Yongqing Cai

Jieying Liu

Liyan Zhang

Jiaojiao Zhao

Li Li

Qinqin Wang

Huimin Zhang

Zhihua Zhang

Dongxia Shi

Luojun Du

2025-11-09

Advanced Functional Materials (publié)

Controllable Generation of Drug-like Molecules with Multi-modal Variational Flow

Fang Sun

Zhihao Zhan

Hongyu Guo

Ming Zhang

Yizhou Sun

Designing drug molecules that bind effectively to target proteins while maintaining desired pharmacological properties remains a fundamental… (voir plus) challenge in drug discovery. Current approaches struggle to simultaneously control molecular topology and 3D geometry, often requiring expensive retraining for new design objectives. We propose a multi-modal variational flow framework that addresses these limitations by integrating a 2D topology encoder with a 3D geometry generator. Our architecture encodes molecular graphs into a learned latent distribution via junction tree representations, then employs normalizing flows to autoregressively generate atoms in 3D space conditioned on the protein binding site. This design enables zero-shot controllability: by manipulating the latent prior distribution, we can generate molecules with specific substructures or optimized properties without model retraining. Experiments on the CrossDocked benchmark show that our model achieves 31.1% high-affinity rate, substantially outperforming existing methods, while maintaining superior drug-likeness and structural diversity. Our framework opens new possibilities for on-demand molecular design, allowing medicinal chemists to rapidly explore chemical space with precise control over both structural motifs and physicochemical properties.

2025-10-21

logconference.io/LOG/2025/Conference (poster)

A Hardware‐in‐Loop Digital Twin Approach for Intelligent Optimization of Municipal Solid Waste Incineration

Wen Yu

JunFei Qiao

2025-10-20

(publié)

Rapid De Novo Antibody Design with GeoFlow-V3

BioGeometry Team

Recent years have witnessed striking advances in miniprotein design, yet de novo antibody discovery remains challenging, marked by low bindi… (voir plus)ng rates and the need for extensive, labor-intensive experimental screening of millions of candidates. This technical report introduces GeoFlow-V3, a unified atomic generative model for structure prediction and protein design. GeoFlow-V3 delivers improved accuracy on antibody-antigen complex structure prediction relative to our previous version, and its performance is further enhanced when experimental constraints or prior knowledge are provided, enabling precise control over both folding and design. The model also demonstrates reliable ability to discriminate binders from non-binders based on its confidence scores. Leveraging this capability, we build a GeoFlow-V3 in silico pipeline to design no more than 50 nanobodies per therapeutically relevant target de novo, completing a single round of wet-lab characterization in under three weeks. GeoFlow-V3 identifies at least one binder for 8 tested epitopes and achieves an average hit rate of 15.5%, representing a two-orders-of-magnitude improvement over prior computational pipelines. These results position GeoFlow-V3 as an appealing platform for rapid, AI-driven therapeutic antibody discovery, significantly reducing experimental screening demands and offering a powerful avenue to tackle previously undruggable targets. A demo of GeoFlow-V3 can be accessed via prot.design for non-commercial use.

2025-10-20

bioRxiv (prépublication)

Aligning Protein Conformation Ensemble Generation with Physical Feedback

Jiarui Lu

Xiaoyin Chen

Stephen Z. Lu

Aurelie Lozano

Vijil Chenthamarakshan

Payel Das

Protein dynamics play a crucial role in protein biological functions and properties, and their traditional study typically relies on time-co… (voir plus)nsuming molecular dynamics (MD) simulations conducted in silico. Recent advances in generative modeling, particularly denoising diffusion models, have enabled efficient accurate protein structure prediction and conformation sampling by learning distributions over crystallographic structures. However, effectively integrating physical supervision into these data-driven approaches remains challenging, as standard energy-based objectives often lead to intractable optimization. In this paper, we introduce Energy-based Alignment (EBA), a method that aligns generative models with feedback from physical models, efficiently calibrating them to appropriately balance conformational states based on their energy differences. Experimental results on the MD ensemble benchmark demonstrate that EBA achieves state-of-the-art performance in generating high-quality protein ensembles. By improving the physical plausibility of generated structures, our approach enhances model predictions and holds promise for applications in structural biology and drug discovery.

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (publié)