Portrait de Bang Liu

Bang Liu

Membre académique associé
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage profond
Apprentissage sur graphes
Exploration des données
Modèles génératifs
Traitement du langage naturel

Biographie

Bang Liu est professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Il est membre du Laboratoire de recherche appliquée en linguistique informatique (RALI) du DIRO, membre associé de Mila – Institut québécois d'intelligence artificielle, et titulaire d'une chaire en IA Canada-CIFAR.

Il a obtenu un baccalauréat en ingénierie de l'Université des sciences et technologies de Chine (USTC) en 2013, ainsi qu’une maîtrise ès sciences et un doctorat de l'Université de l'Alberta en 2015 et en 2020, respectivement. Ses recherches portent principalement sur le traitement du langage naturel, l'apprentissage multimodal et incarné, la théorie et les techniques de l'intelligence artificielle (par exemple, la compréhension et l'amélioration de grands modèles de langage) et l'intelligence artificielle pour la science (par exemple, la santé, la science des matériaux et la radiologie).

Étudiants actuels

Doctorat - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Stagiaire de recherche - UdeM
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM

Publications

VCR: Visual Caption Restoration
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (voir plus)texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.
VCR: Visual Caption Restoration
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (voir plus)texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.
VCR: Visual Caption Restoration
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (voir plus)texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.
GOAt: Explaining Graph Neural Networks via Graph Output Attribution
Shengyao Lu
Keith G Mills
Jiao He
Di Niu
Understanding the decision-making process of Graph Neural Networks (GNNs) is crucial to their interpretability. Most existing methods for ex… (voir plus)plaining GNNs typically rely on training auxiliary models, resulting in the explanations remain black-boxed. This paper introduces Graph Output Attribution (GOAt), a novel method to attribute graph outputs to input graph features, creating GNN explanations that are faithful, discriminative, as well as stable across similar samples. By expanding the GNN as a sum of scalar products involving node features, edge features and activation patterns, we propose an efficient analytical method to compute contribution of each node or edge feature to each scalar product and aggregate the contributions from all scalar products in the expansion form to derive the importance of each node and edge. Through extensive experiments on synthetic and real-world data, we show that our method not only outperforms various state-of-the-art GNN explainers in terms of the commonly used fidelity metric, but also exhibits stronger discriminability, and stability by a remarkable margin.
Efficient Classification of Long Documents via State-Space Models
Peng Lu
Suyuchen Wang
Mehdi Rezagholizadeh
Ivan Kobyzev
HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science
Yu Song
Santiago Miret
Huan Zhang
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization
Yuyan Chen
Zhihao Wen
Ge Fan
Zhengyu Chen
Wei Wu
Dayiheng Liu
Zhixu Li
Yanghua Xiao
SkillQG: Learning to Generate Question for Reading Comprehension Assessment
Xiaoqiang Wang
Siliang Tang
Lingfei Wu
MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling
Yurun Song
Santiago Miret
Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language Models
Zhong Zhang
Junming Shao
QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance
Xiaoqiang Wang
Siliang Tang
Lingfei Wu
Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input cont… (voir plus)ext of generation, rendering the lack of deep understanding of the relevance between the generated questions and input contexts. As a result, they may wrongly penalize a legitimate and reasonable candidate question when it (1) involves complicated reasoning with the context or (2) can be grounded by multiple evidences in the context.In this paper, we propose QRelScore, a context-aware Relevance evaluation metric for Question Generation.Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation to cope with the complicated reasoning and diverse generation from multiple evidences, respectively.Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.
Better Modeling the Programming World with Code Concept Graphs-augmented Multi-modal Learning
Martin Weyssow
Houari Sahraoui
The progress made in code modeling has been tremendous in recent years thanks to the design of natural language processing learning approach… (voir plus)es based on state-of-the-art model architectures. Nevertheless, we believe that the current state-of-the-art does not focus enough on the full potential that data may bring to a learning process in software engineering. Our vision articulates on the idea of leveraging multi-modal learning approaches to modeling the programming world. In this paper, we investigate one of the underlying idea of our vision whose objective based on concept graphs of identifiers aims at leveraging high-level relationships between domain concepts manipulated through particular language constructs. In particular, we propose to enhance an existing pretrained language model of code by joint-learning it with a graph neural network based on our concept graphs. We conducted a preliminary evaluation that shows gain of effectiveness of the models for code search using a simple joint-learning method and prompts us to further investigate our research vision.