Portrait de Jian Tang

Jian Tang

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, HEC Montréal, Département de sciences de la décision
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle (DIRO)
Fondateur, BioGeometry
Sujets de recherche
Biologie computationnelle
Grands modèles de langage (LLM)
IA pour la science
Modèles génératifs
Modélisation moléculaire
Réseaux de neurones en graphes

Biographie

Jian Tang est professeur agrégé au département de sciences de la décision de HEC. Il est aussi professeur associé au département informatique et recherche opérationnelle (DIRO) de l'Université de Montréal et un membre académique principal à Mila – Institut québécois d’intelligence artificielle. Il est titulaire d'une chaire de recherche en IA Canada-CIFAR et le fondateur de BioGeometry, une entreprise en démarrage spécialisée dans l'IA générative pour la découverte d'anticorps. Ses principaux domaines de recherche sont les modèles génératifs profonds, l'apprentissage automatique des graphes et leurs applications à la découverte de médicaments. Il est un leader international dans le domaine de l'apprentissage automatique des graphes, et son travail représentatif sur l'apprentissage de la représentation des nœuds, LINE, a été largement reconnu et cité plus de 5 000 fois. Il a également réalisé de nombreux travaux pionniers sur l'IA pour la découverte de médicaments, notamment le premier cadre d'apprentissage automatique à source ouverte pour la découverte de médicaments, TorchDrug et TorchProtein.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - UdeM

Publications

Multi-reservoir ESN-based prediction strategy for dynamic multi-objective optimization
Cuili Yang
Danlei Wang
JunFei Qiao
Wen Yu
NOx emissions prediction for MSWI process based on dynamic modular neural network
Haoshan Duan
Xi Meng
JunFei Qiao
Online Measurement of Dioxin Emission in Solid Waste Incineration Using Fuzzy Broad Learning
Heng Xia
Wen Yu
JunFei Qiao
Dioxin (DXN) is a persistent organic pollutant produced from municipal solid waste incineration (MSWI) processes. It is a crucial environmen… (voir plus)tal indicator to minimize emission concentration by using optimization control, but it is difficult to monitor in real time. Aiming at online soft-sensing of DXN emission, a novel fuzzy tree broad learning system (FTBLS) is proposed, which includes offline training and online measurement. In the offline training part, weighted k-means is presented to construct a typical sample pool for reduced learning costs of offline and online phases. Moreover, the novel FTBLS, which contains a feature mapping layer, enhance layer, and increment layer, by replacing the fuzzy decision tree with neurons applied to construct the offline model. In the online measurement part, recursive principal component analysis is used to monitor the time-varying characteristic of the MSWI process. To measure DXN emission, offline FTBLS is reused for normal samples; for drift samples, fast incremental learning is used for online updates. A DXN data from the actual MSWI process is employed to prove the usefulness of FTBLS, where the RMSE of training and testing data are 0.0099 and 0.0216, respectively. This result shows that FTBLS can effectively realize DXN online prediction.
The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges
Qincheng Lu
Lirong Wu
Xinyu Wang
Xiao-Wen Chang
Rex Ying
Stan Z. Li
Stefanie Jegelka
Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to b… (voir plus)e the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.
Tree Broad Learning System for Small Data Modeling.
Heng Xia
Wen Yu
JunFei Qiao
Broad learning system based on neural network (BLS-NN) has poor efficiency for small data modeling with various dimensions. Tree-based BLS (… (voir plus)TBLS) is designed for small data modeling by introducing nondifferentiable modules and an ensemble strategy to the traditional broad learning system (BLS). TBLS replaces the neurons of BLS with the tree modules to map the input data. Moreover, we present three new TBLS variant methods and their incremental learning implementations, which are motivated by deep, broad, and ensemble learning. Their major distinction is reflected in the incremental learning strategies based on: 1) mean square error (mse); 2) pseudo-inverse; and 3) pseudo-inverse theory and stack representation. Therefore, this study further explores the domain of BLS based on the nondifferentiable modules. The simulations are compared with some state-of-the-art (SOTA) BLS-NN and tree methods under high-, medium-, and low-dimensional benchmark datasets. Results show that the proposed method outperforms the BLS-NN, and the modeling accuracy is remarkably improved with the small training data of the proposed TBLS.
Giant Correlated Gap and Possible Room-Temperature Correlated States in Twisted Bilayer MoS_{2}.
Fanfan Wu
Qiaoling Xu
Qinqin Wang
Yanbang Chu
Li Li
Jieying Liu
Jinpeng Tian
Yiru Ji
Le Liu
Yalong Yuan
Zhiheng Huang
Jiaojiao Zhao
Xiaozhou Zan
Kenji Watanabe
Takashi Taniguchi
Dongxia Shi
Gangxu Gu
Yang Xu
Lede Xian … (voir 3 de plus)
Wei Yang
Luojun Du
Guangyu Zhang
Moiré superlattices have emerged as an exciting condensed-matter quantum simulator for exploring the exotic physics of strong electronic co… (voir plus)rrelations. Notable progress has been witnessed, but such correlated states are achievable usually at low temperatures. Here, we report evidence of possible room-temperature correlated electronic states and layer-hybridized SU(4) model simulator in AB-stacked MoS_{2} homobilayer moiré superlattices. Correlated insulating states at moiré band filling factors v=1, 2, 3 are unambiguously established in twisted bilayer MoS_{2}. Remarkably, the correlated electronic state at v=1 shows a giant correlated gap of ∼126  meV and may persist up to a record-high critical temperature over 285 K. The realization of a possible room-temperature correlated state with a large correlated gap in twisted bilayer MoS_{2} can be understood as the cooperation effects of the stacking-specific atomic reconstruction and the resonantly enhanced interlayer hybridization, which largely amplify the moiré superlattice effects on electronic correlations. Furthermore, extreme large nonlinear Hall responses up to room temperature are uncovered near correlated electronic states, demonstrating the quantum geometry of moiré flat conduction band.
Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing
Weili Nie
Chengpeng Wang
Zhuoran Qiao
Ling Liu
Chaowei Xiao
Animashree Anandkumar
There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize … (voir plus)the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure–text model, MoleculeSTM, by jointly learning molecules’ chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure–text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure–text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks. Machine learning methods in cheminformatics have made great progress in using chemical structures of molecules, but a large portion of textual information remains scarcely explored. Liu and colleagues trained MoleculeSTM, a foundation model that aligns the structure and text modalities through contrastive learning, and show its utility on the downstream tasks of structure–text retrieval, text-guided editing and molecular property prediction.
Room-temperature correlated states in twisted bilayer MoS$_2$
Fanfan Wu
Qiaoling Xu
Qinqin Wang
Yanbang Chu
Li Li
Jieying Liu
Jinpeng Tian
Yiru Ji
Le Liu
Yalong Yuan
Zhiheng Huang
Jiaojiao Zhao
Xiaozhou Zan
Kenji Watanabe
Takashi Taniguchi
Dongxia Shi
Gangxu Gu
Yang Xu
Lede Xian … (voir 3 de plus)
Wei Yang
Luojun Du
Guangyu Zhang
PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
Chuanrui Wang
Bozitao Zhong
Narendra Chaudhary
Sanchit Misra
Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a univers… (voir plus)ally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the
Large Language Models can Learn Rules
Yuan Xue
Xinyun Chen
Denny Zhou
Dale Schuurmans
Hanjun Dai
An Empirical Study of Retrieval-Enhanced Graph Neural Networks
Dingmin Wang
Hanchen Wang
Bernardo Cuenca Grau
Linfeng Song
Le Song
Qi Liu
Graph Neural Networks (GNNs) are effective tools for graph representation learning. Most GNNs rely on a recursive neighborhood aggregation s… (voir plus)cheme, named message passing, thereby their theoretical expressive power is limited to the first-order Weisfeiler-Lehman test (1-WL). An effective approach to this challenge is to explicitly retrieve some annotated examples used to enhance GNN models. While retrieval-enhanced models have been proved to be effective in many language and vision domains, it remains an open question how effective retrieval-enhanced GNNs are when applied to graph datasets. Motivated by this, we want to explore how the retrieval idea can help augment the useful information learned in the graph neural networks, and we design a retrieval-enhanced scheme called GRAPHRETRIEVAL, which is agnostic to the choice of graph neural network models. In GRAPHRETRIEVAL, for each input graph, similar graphs together with their ground-true labels are retrieved from an existing database. Thus they can act as a potential enhancement to complete various graph property predictive tasks. We conduct comprehensive experiments over 13 datasets, and we observe that GRAPHRETRIEVAL is able to reach substantial improvements over existing GNNs. Moreover, our empirical study also illustrates that retrieval enhancement is a promising remedy for alleviating the long-tailed label distribution problem.
Evaluating Self-Supervised Learning for Molecular Graph Embeddings
Hanchen Wang
Jean Kaddour
Matt J. Kusner
Joan Lasenby
Qi Liu
Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiring embeddings without expert labelling, a capability that carries… (voir plus) profound implications for molecular graphs due to the staggering number of potential molecules and the high cost of obtaining labels. However, GSSL methods are designed not for optimisation within a specific domain but rather for transferability across a variety of downstream tasks. This broad applicability complicates their evaluation. Addressing this challenge, we present"Molecular Graph Representation Evaluation"(MOLGRAPHEVAL), generating detailed profiles of molecular graph embeddings with interpretable and diversified attributes. MOLGRAPHEVAL offers a suite of probing tasks grouped into three categories: (i) generic graph, (ii) molecular substructure, and (iii) embedding space properties. By leveraging MOLGRAPHEVAL to benchmark existing GSSL methods against both current downstream datasets and our suite of tasks, we uncover significant inconsistencies between inferences drawn solely from existing datasets and those derived from more nuanced probing. These findings suggest that current evaluation methodologies fail to capture the entirety of the landscape.