Portrait de Bang Liu

Bang Liu

Membre académique associé
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage profond
Apprentissage sur graphes
Exploration des données
Modèles génératifs
Traitement du langage naturel

Biographie

Bang Liu est professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Il est membre du Laboratoire de recherche appliquée en linguistique informatique (RALI) du DIRO, membre associé de Mila – Institut québécois d'intelligence artificielle, et titulaire d'une chaire en IA Canada-CIFAR.

Il a obtenu un baccalauréat en ingénierie de l'Université des sciences et technologies de Chine (USTC) en 2013, ainsi qu’une maîtrise ès sciences et un doctorat de l'Université de l'Alberta en 2015 et en 2020, respectivement. Ses recherches portent principalement sur le traitement du langage naturel, l'apprentissage multimodal et incarné, la théorie et les techniques de l'intelligence artificielle (par exemple, la compréhension et l'amélioration de grands modèles de langage) et l'intelligence artificielle pour la science (par exemple, la santé, la science des matériaux et la radiologie).

Étudiants actuels

Visiteur de recherche indépendant - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Maîtrise recherche - UdeM
Stagiaire de recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Maîtrise recherche - UdeM

Publications

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Yili Li
Jing Yu
Keke Gai
Gang Xiong
Qi Wu
Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, wh… (voir plus)ich are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in natural language processing and computer vision, and have been successfully applied in document retrieval, but their application in multimodal retrieval remains unexplored. To enhance retrieval efficiency, in this paper, we introduce a model-based video indexer named T2VIndexer, which is a sequence-to-sequence generative model directly generating video identifiers and retrieving candidate videos with constant time complexity. T2VIndexer aims to reduce retrieval time while maintaining high accuracy. To achieve this goal, we propose video identifier encoding and query-identifier augmentation approaches to represent videos as short sequences while preserving their semantic information. Our method consistently enhances the retrieval efficiency of current state-of-the-art models on four standard datasets. It enables baselines with only 30%-50% of the original retrieval time to achieve better retrieval performance on MSR-VTT (+1.0%), MSVD (+1.8%), ActivityNet (+1.5%), and DiDeMo (+0.2%). The code is available at https://anonymous.4open.science/r/T2VIndexer-40BE.
EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time
Shengyao Lu
Keith G Mills
Jiao He
Di Niu
VCR: Visual Caption Restoration
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (voir plus)texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.
GOAt: Explaining Graph Neural Networks via Graph Output Attribution
Shengyao Lu
Keith G Mills
Jiao He
Di Niu
Understanding the decision-making process of Graph Neural Networks (GNNs) is crucial to their interpretability. Most existing methods for ex… (voir plus)plaining GNNs typically rely on training auxiliary models, resulting in the explanations remain black-boxed. This paper introduces Graph Output Attribution (GOAt), a novel method to attribute graph outputs to input graph features, creating GNN explanations that are faithful, discriminative, as well as stable across similar samples. By expanding the GNN as a sum of scalar products involving node features, edge features and activation patterns, we propose an efficient analytical method to compute contribution of each node or edge feature to each scalar product and aggregate the contributions from all scalar products in the expansion form to derive the importance of each node and edge. Through extensive experiments on synthetic and real-world data, we show that our method not only outperforms various state-of-the-art GNN explainers in terms of the commonly used fidelity metric, but also exhibits stronger discriminability, and stability by a remarkable margin.
Efficient Classification of Long Documents via State-Space Models
Peng Lu
Suyuchen Wang
Mehdi Rezagholizadeh
Ivan Kobyzev
HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science
Yu Song
Santiago Miret
Huan Zhang
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization
Yuyan Chen
Zhihao Wen
Ge Fan
Zhengyu Chen
Wei Wu
Dayiheng Liu
Zhixu Li
Yanghua Xiao
SkillQG: Learning to Generate Question for Reading Comprehension Assessment
Xiaoqiang Wang
Siliang Tang
Lingfei Wu
MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling
Yurun Song
Santiago Miret
Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language Models
Zhong Zhang
Junming Shao
QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance
Xiaoqiang Wang
Siliang Tang
Lingfei Wu
Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input cont… (voir plus)ext of generation, rendering the lack of deep understanding of the relevance between the generated questions and input contexts. As a result, they may wrongly penalize a legitimate and reasonable candidate question when it (1) involves complicated reasoning with the context or (2) can be grounded by multiple evidences in the context.In this paper, we propose QRelScore, a context-aware Relevance evaluation metric for Question Generation.Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation to cope with the complicated reasoning and diverse generation from multiple evidences, respectively.Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.
Better Modeling the Programming World with Code Concept Graphs-augmented Multi-modal Learning
Martin Weyssow
Houari Sahraoui
The progress made in code modeling has been tremendous in recent years thanks to the design of natural language processing learning approach… (voir plus)es based on state-of-the-art model architectures. Nevertheless, we believe that the current state-of-the-art does not focus enough on the full potential that data may bring to a learning process in software engineering. Our vision articulates on the idea of leveraging multi-modal learning approaches to modeling the programming world. In this paper, we investigate one of the underlying idea of our vision whose objective based on concept graphs of identifiers aims at leveraging high-level relationships between domain concepts manipulated through particular language constructs. In particular, we propose to enhance an existing pretrained language model of code by joint-learning it with a graph neural network based on our concept graphs. We conducted a preliminary evaluation that shows gain of effectiveness of the models for code search using a simple joint-learning method and prompts us to further investigate our research vision.