Jian Tang

Biography

Jian Tang is an Associate professor at HEC's Department of Decision Sciences. He is also an Adjunct professor at the Department of Computer Science and Operations Research at University of Montreal and a Core Academic member at Mila - Quebec AI Institute. He is a Canada CIFAR AI Chair and the Founder of BioGeometry, an AI startup that focuses on generative AI for antibody discovery. Tang’s main research interests are deep generative models and graph machine learning, and their applications to drug discovery. He is an international leader in graph machine learning, and LINE, his node representation method, has been widely recognized and cited more than five thousand times. He has also done pioneering work on AI for drug discovery, such as developing the first open-source machine learning frameworks for drug discovery, TorchDrug and TorchProtein.

Current Students

Huiyu Cai

PhD - Université de Montréal

huiyu.cai@mila.quebec

Collaborating researcher

Farzaneh Heidari

PhD - Université de Montréal

Principal supervisor :

Guillaume Rabusseau

Jerry Lu

PhD - Université de Montréal

Shentong Mo

Collaborating researcher - Carnegie Mellon University

Chence Shi

PhD - Université de Montréal

Sophie Xhonneux

PhD - Université de Montréal

Principal supervisor :

Gauthier Gidel

lpxhonneux@gmail.com

Xinyu Yuan

PhD - Université de Montréal

xinyu.yuan@mila.quebec

Zhihao Zhan

PhD - Université de Montréal

Zuobai Zhang

PhD - Université de Montréal

zhaocheng.zhu@umontreal.ca

Jianan Zhao

PhD - Université de Montréal

PhD - Université de Montréal

Publications

Zero-shot Logical Query Reasoning on any Knowledge Graph

Mikhail Galkin

Jincheng Zhou

Bruno Ribeiro

Zhaocheng Zhu

Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional querie… (see more)s comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph. Here we present UltraQuery, an inductive reasoning model that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions which generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG reasoning model, UltraQuery can solve CLQA on any KG even if it is only finetuned on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 14 of them.

2024-01-01

NeurIPS (published)

Giant Correlated Gap and Possible Room-Temperature Correlated States in Twisted Bilayer MoS_{2}.

Fanfan Wu

Qiaoling Xu

Qinqin Wang

Yanbang Chu

Lu Li

Jieying Liu

Jinpeng Tian

Yiru Ji

Le Liu

Yalong Yuan

Zhiheng Huang

Jiaojiao Zhao

Xiaozhou Zan

Kenji Watanabe

Takashi Taniguchi

Dongxia Shi

Gangxu Gu

Yang Xu

Lede Xian … (see 3 more)

Wei Yang

Luojun Du

Guangyu Zhang

Moiré superlattices have emerged as an exciting condensed-matter quantum simulator for exploring the exotic physics of strong electronic co… (see more)rrelations. Notable progress has been witnessed, but such correlated states are achievable usually at low temperatures. Here, we report evidence of possible room-temperature correlated electronic states and layer-hybridized SU(4) model simulator in AB-stacked MoS_{2} homobilayer moiré superlattices. Correlated insulating states at moiré band filling factors v=1, 2, 3 are unambiguously established in twisted bilayer MoS_{2}. Remarkably, the correlated electronic state at v=1 shows a giant correlated gap of ∼126 meV and may persist up to a record-high critical temperature over 285 K. The realization of a possible room-temperature correlated state with a large correlated gap in twisted bilayer MoS_{2} can be understood as the cooperation effects of the stacking-specific atomic reconstruction and the resonantly enhanced interlayer hybridization, which largely amplify the moiré superlattice effects on electronic correlations. Furthermore, extreme large nonlinear Hall responses up to room temperature are uncovered near correlated electronic states, demonstrating the quantum geometry of moiré flat conduction band.

2023-12-18

Physical Review Letters (published)

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Shengchao Liu

Weili Nie

Chengpeng Wang

Jiarui Lu

Zhuoran Qiao

Ling Liu

Chaowei Xiao

Animashree Anandkumar

There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize … (see more)the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.

2023-12-18

Nature Machine Intelligence (published)

Pretrainable Geometric Graph Neural Network for Antibody Affinity Maturation

Huiyu Cai

Zuobai Zhang

Mingkai Wang

Bozitao Zhong

Yanling Wu

Tianlei Ying

In the realm of antibody therapeutics development, increasing the binding affinity of an antibody to its target antigen is a crucial task. T… (see more)his paper presents GearBind, a pretrainable deep neural network designed to be effective for in silico affinity maturation. Leveraging multi-level geometric message passing alongside contrastive pretraining on protein structural data, GearBind capably models the complex interplay of atom-level interactions within protein complexes, surpassing previous state-of-the-art approaches on SKEMPI v2 in terms of Pearson correlation, mean absolute error (MAE) and root mean square error (RMSE). In silico experiments elucidate that pretraining helps GearBind become sensitive to mutation-induced binding affinity changes and reflective of amino acid substitution tendency. Using an ensemble model based on pretrained GearBind, we successfully optimize the affinity of CR3022 to the spike (S) protein of the SARS-CoV-2 Omicron strain. Our strategy yields a high success rate with up to 17-fold affinity increase. GearBind proves to be an effective tool in narrowing the search space for in vitro antibody affinity maturation, underscoring the utility of geometric deep learning and adept pre-training in macromolecule interaction modeling.

2023-12-07

bioRxiv (preprint)

Pretrainable Geometric Graph Neural Network for Antibody Affinity Maturation

Huiyu Cai

Zuobai Zhang

Mingkai Wang

Bozitao Zhong

Yanling Wu

Tianlei Ying

2023-12-07

bioRxiv (preprint)

Room-temperature correlated states in twisted bilayer MoS$_2$

Fanfan Wu

Qiaoling Xu

Qinqin Wang

Yanbang Chu

Lu Li

Jieying Liu

Jinpeng Tian

Yiru Ji

Le Liu

Yalong Yuan

Zhiheng Huang

Jiaojiao Zhao

Xiaozhou Zan

Kenji Watanabe

Takashi Taniguchi

Dongxia Shi

Gangxu Gu

Yang Xu

L. Xian … (see 3 more)

Wei Yang

Luojun Du

Guangyu Zhang

2023-11-28

ArXiv (preprint)

PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design

Chuanrui Wang

Bozitao Zhong

Zuobai Zhang

Narendra Chaudhary

Sanchit Misra

Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a univers… (see more)ally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the

2023-10-25

NeurIPS.cc/2023/Workshop/AI4D3 (poster)

PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design

Chuanrui Wang

Bozitao Zhong

Zuobai Zhang

Narendra Chaudhary

Sanchit Misra

2023-10-25

NeurIPS.cc/2023/Workshop/AI4D3 (poster)

Large Language Models can Learn Rules

Zhaocheng Zhu

Yuan Xue

Xinyun Chen

Denny Zhou

Dale Schuurmans

Hanjun Dai

2023-10-11

ArXiv (preprint)

GraphText: Graph Reasoning in Text Space

Jianan Zhao

Le Zhuo

Yikang Shen

Meng Qu

Kai Liu

Michael Bronstein

Zhaocheng Zhu

Large Language Models (LLMs) have gained the ability to assimilate human knowledge and facilitate natural language interactions with both hu… (see more)mans and other LLMs. However, despite their impressive achievements, LLMs have not made significant advancements in the realm of graph machine learning. This limitation arises because graphs encapsulate distinct relational data, making it challenging to transform them into natural language that LLMs understand. In this paper, we bridge this gap with a novel framework, GraphText, that translates graphs into natural language. GraphText derives a graph-syntax tree for each graph that encapsulates both the node attributes and inter-node relationships. Traversal of the tree yields a graph text sequence, which is then processed by an LLM to treat graph tasks as text generation tasks. Notably, GraphText offers multiple advantages. It introduces training-free graph reasoning: even without training on graph data, GraphText with ChatGPT can achieve on par with, or even surpassing, the performance of supervised-trained graph neural networks through in-context learning (ICL). Furthermore, GraphText paves the way for interactive graph reasoning, allowing both humans and LLMs to communicate with the model seamlessly using natural language. These capabilities underscore the vast, yet-to-be-explored potential of LLMs in the domain of graph machine learning.

2023-10-02

ArXiv (preprint)

An Empirical Study of Retrieval-Enhanced Graph Neural Networks

Dingmin Wang

Shengchao Liu

Hanchen Wang

Bernardo Cuenca Grau

Linfeng Song

Le Song

Qi Liu

Graph Neural Networks (GNNs) are effective tools for graph representation learning. Most GNNs rely on a recursive neighborhood aggregation s… (see more)cheme, named message passing, thereby their theoretical expressive power is limited to the first-order Weisfeiler-Lehman test (1-WL). An effective approach to this challenge is to explicitly retrieve some annotated examples used to enhance GNN models. While retrieval-enhanced models have been proved to be effective in many language and vision domains, it remains an open question how effective retrieval-enhanced GNNs are when applied to graph datasets. Motivated by this, we want to explore how the retrieval idea can help augment the useful information learned in the graph neural networks, and we design a retrieval-enhanced scheme called GRAPHRETRIEVAL, which is agnostic to the choice of graph neural network models. In GRAPHRETRIEVAL, for each input graph, similar graphs together with their ground-true labels are retrieved from an existing database. Thus they can act as a potential enhancement to complete various graph property predictive tasks. We conduct comprehensive experiments over 13 datasets, and we observe that GRAPHRETRIEVAL is able to reach substantial improvements over existing GNNs. Moreover, our empirical study also illustrates that retrieval enhancement is a promising remedy for alleviating the long-tailed label distribution problem.

2023-09-28

Frontiers in Artificial Intelligence and Applications (published)

Evaluating Self-Supervised Learning for Molecular Graph Embeddings

Hanchen Wang

Jean Kaddour

Shengchao Liu

Matt J. Kusner

Joan Lasenby

Qi Liu

Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiring embeddings without expert labelling, a capability that carries… (see more) profound implications for molecular graphs due to the staggering number of potential molecules and the high cost of obtaining labels. However, GSSL methods are designed not for optimisation within a specific domain but rather for transferability across a variety of downstream tasks. This broad applicability complicates their evaluation. Addressing this challenge, we present"Molecular Graph Representation Evaluation"(MOLGRAPHEVAL), generating detailed profiles of molecular graph embeddings with interpretable and diversified attributes. MOLGRAPHEVAL offers a suite of probing tasks grouped into three categories: (i) generic graph, (ii) molecular substructure, and (iii) embedding space properties. By leveraging MOLGRAPHEVAL to benchmark existing GSSL methods against both current downstream datasets and our suite of tasks, we uncover significant inconsistencies between inferences drawn solely from existing datasets and those derived from more nuanced probing. These findings suggest that current evaluation methodologies fail to capture the entirety of the landscape.