Bang Liu

rushil.gupta@umontreal.ca

Biography

Bang Liu is an assistant professor in the Department of Computer Science and Operations Research (DIRO), and a core member of the Applied Research in Computational Linguistics Lab (RALI) at Université de Montréal. He is also an associate academic member of Mila – Quebec Artificial Intelligence Institute and a Canada CIFAR AI Chair.

Liu received his BEng from the University of Science and Technology of China in 2013, and his MSc and PhD degrees from the University of Alberta in 2015 and 2020, respectively. His research interests lie primarily in the areas of natural language processing, multimodal and embodied learning, theory and techniques for AGI (e.g., understanding and improving large language models), and AI for science (e.g., health, material science, XR).

Current Students

Bibo Cai

Independent visiting researcher - Université de Montréal

Qianggang Ding

PhD - Université de Montréal

Rushil Gupta

Master's Research - Université de Montréal

Gauransh Kumar

Master's Research - Université de Montréal

Yizhan Li

Master's Research - Université de Montréal

Xiaotong Lyu

Research Intern - Université de Montréal

Jeremy Qin

Master's Research - Université de Montréal

Kyle Roth

PhD - Université de Montréal

Haochen Shi

PhD - Université de Montréal

Zhiyuan Sun

Master's Research - Université de Montréal

Jinghan Sun

Master's Research - Université de Montréal

PhD - Université de Montréal

Xiaoqiang Wang

PhD - Université de Montréal

Sifan Wu

PhD - Université de Montréal

Dekun Wu

PhD - Université de Montréal

Tony Yuan

Master's Research - Université de Montréal

Huan Zhang

Master's Research - Université de Montréal

Publications

VCR: Visual Caption Restoration

Tianyu Zhang

Suyuchen Wang

Lu Li

Ge Zhang

Perouz Taslakian

Sai Rajeswar

Jie Fu

Yoshua Bengio

We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (see more)texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.

2024-10-09

NeurIPS.cc/2024/Workshop/Sys2-Reasoning (poster)

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

Yili Li

Jing Yu

Keke Gai

Gang Xiong

Qi Wu

Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, wh… (see more)ich are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in natural language processing and computer vision, and have been successfully applied in document retrieval, but their application in multimodal retrieval remains unexplored. To enhance retrieval efficiency, in this paper, we introduce a model-based video indexer named T2VIndexer, which is a sequence-to-sequence generative model directly generating video identifiers and retrieving candidate videos with constant time complexity. T2VIndexer aims to reduce retrieval time while maintaining high accuracy. To achieve this goal, we propose video identifier encoding and query-identifier augmentation approaches to represent videos as short sequences while preserving their semantic information. Our method consistently enhances the retrieval efficiency of current state-of-the-art models on four standard datasets. It enables baselines with only 30%-50% of the original retrieval time to achieve better retrieval performance on MSR-VTT (+1.0%), MSVD (+1.8%), ActivityNet (+1.5%), and DiDeMo (+0.2%). The code is available at https://anonymous.4open.science/r/T2VIndexer-40BE.

2024-07-20

acmmm.org/ACMMM/2024/Conference (oral)

EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time

Shengyao Lu

Keith G Mills

Jiao He

Di Niu

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

GOAt: Explaining Graph Neural Networks via Graph Output Attribution

Shengyao Lu

Keith G Mills

Jiao He

Di Niu

Understanding the decision-making process of Graph Neural Networks (GNNs) is crucial to their interpretability. Most existing methods for ex… (see more)plaining GNNs typically rely on training auxiliary models, resulting in the explanations remain black-boxed. This paper introduces Graph Output Attribution (GOAt), a novel method to attribute graph outputs to input graph features, creating GNN explanations that are faithful, discriminative, as well as stable across similar samples. By expanding the GNN as a sum of scalar products involving node features, edge features and activation patterns, we propose an efficient analytical method to compute contribution of each node or edge feature to each scalar product and aggregate the contributions from all scalar products in the expansion form to derive the importance of each node and edge. Through extensive experiments on synthetic and real-world data, we show that our method not only outperforms various state-of-the-art GNN explainers in terms of the commonly used fidelity metric, but also exhibits stronger discriminability, and stability by a remarkable margin.

2024-01-16

ICLR.cc/2024/Conference (poster)

Efficient Classification of Long Documents via State-Space Models

Peng Lu

Suyuchen Wang

Mehdi Rezagholizadeh

Ivan Kobyzev

2023-10-07

EMNLP/2023/Conference (accepted)

HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science

Yu Song

Santiago Miret

Huan Zhang

2023-10-07

EMNLP/2023/Conference (published)

MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization

Yuyan Chen

Zhihao Wen

Ge Fan

Zhengyu Chen

Wei Wu

Dayiheng Liu

Zhixu Li

Yanghua Xiao

2023-10-07

EMNLP/2023/Conference (published)

SkillQG: Learning to Generate Question for Reading Comprehension Assessment

Xiaoqiang Wang

Siliang Tang

Lingfei Wu

2023-07-01

Findings of the Association for Computational Linguistics: ACL 2023 (published)

MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling

Yurun Song

Santiago Miret

2023-05-14

ArXiv (preprint)

Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language Models

Zhong Zhang

Junming Shao

2023-01-01

ACL (1) (published)

QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance

Xiaoqiang Wang

Siliang Tang

Lingfei Wu

Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input cont… (see more)ext of generation, rendering the lack of deep understanding of the relevance between the generated questions and input contexts. As a result, they may wrongly penalize a legitimate and reasonable candidate question when it (1) involves complicated reasoning with the context or (2) can be grounded by multiple evidences in the context.In this paper, we propose QRelScore, a context-aware Relevance evaluation metric for Question Generation.Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation to cope with the complicated reasoning and diverse generation from multiple evidences, respectively.Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.

2022-12-01

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (published)

Better Modeling the Programming World with Code Concept Graphs-augmented Multi-modal Learning

Martin Weyssow

Houari Sahraoui

The progress made in code modeling has been tremendous in recent years thanks to the design of natural language processing learning approach… (see more)es based on state-of-the-art model architectures. Nevertheless, we believe that the current state-of-the-art does not focus enough on the full potential that data may bring to a learning process in software engineering. Our vision articulates on the idea of leveraging multi-modal learning approaches to modeling the programming world. In this paper, we investigate one of the underlying idea of our vision whose objective based on concept graphs of identifiers aims at leveraging high-level relationships between domain concepts manipulated through particular language constructs. In particular, we propose to enhance an existing pretrained language model of code by joint-learning it with a graph neural network based on our concept graphs. We conducted a preliminary evaluation that shows gain of effectiveness of the models for code search using a simple joint-learning method and prompts us to further investigate our research vision.

2022-05-22

2022 IEEE/ACM 44th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER) (published)