Portrait of Bang Liu

Bang Liu

Associate Academic Member
Canada CIFAR AI Chair
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Data Mining
Deep Learning
Generative Models
Learning on Graphs
Natural Language Processing

Biography

Bang Liu is an assistant professor in the Department of Computer Science and Operations Research (DIRO), and a core member of the Applied Research in Computational Linguistics Lab (RALI) at Université de Montréal. He is also an associate academic member of Mila – Quebec Artificial Intelligence Institute and a Canada CIFAR AI Chair.

Liu received his BEng from the University of Science and Technology of China in 2013, and his MSc and PhD degrees from the University of Alberta in 2015 and 2020, respectively. His research interests lie primarily in the areas of natural language processing, multimodal and embodied learning, theory and techniques for AGI (e.g., understanding and improving large language models), and AI for science (e.g., health, material science, XR).

Current Students

Independent visiting researcher - Université de Montréal
PhD - Université de Montréal
Master's Research - Université de Montréal
Master's Research - Université de Montréal
Master's Research - Université de Montréal
Research Intern - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Master's Research - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Master's Research - Université de Montréal
Master's Research - Université de Montréal

Publications

S$^3$: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks
Xinlin Li
Yaoliang Yu
Wulong Liu
Chunjing Xu
Vahid Partovi Nia
Refining BERT Embeddings for Document Hashing via Mutual Information Maximization
Zijing Ou
Qinliang Su
Jianxing Yu
Ruihui Zhao
Yefeng Zheng
Existing unsupervised document hashing methods are mostly established on generative models. Due to the difficulties of capturing long depend… (see more)ency structures, these methods rarely model the raw documents directly, but instead to model the features extracted from them (e.g. bag-of-words (BOW), TFIDF). In this paper, we propose to learn hash codes from BERT embeddings after observing their tremendous successes on downstream tasks. As a first try, we modify existing generative hashing models to accommodate the BERT embeddings. However, little improvement is observed over the codes learned from the old BOW or TFIDF features. We attribute this to the reconstruction requirement in the generative hashing, which will enforce irrelevant information that is abundant in the BERT embeddings also compressed into the codes. To remedy this issue, a new unsupervised hashing paradigm is further proposed based on the mutual information (MI) maximization principle. Specifically, the method first constructs appropriate global and local codes from the documents and then seeks to maximize their mutual information. Experimental results on three benchmark datasets demonstrate that the proposed method is able to generate hash codes that outperform existing ones learned from BOW features by a substantial margin.
Graph Neural Networks in Natural Language Processing
Lingfei Wu
Natural language processing (NLP) and understanding aim to read from unformatted text to accomplish different tasks. While word embeddings l… (see more)earned by deep neural networks are widely used, the underlying linguistic and semantic structures of text pieces cannot be fully exploited in these representations. Graph is a natural way to capture the connections between different text pieces, such as entities, sentences, and documents. To overcome the limits in vector space models, researchers combine deep learning models with graph-structured representations for various tasks in NLP and text mining. Such combinations help to make full use of both the structural information in text and the representation learning ability of deep neural networks. In this chapter, we introduce the various graph representations that are extensively used in NLP, and show how different NLP tasks can be tackled from a graph perspective. We summarize recent research works on graph-based NLP, and discuss two case studies related to graph-based text clustering, matching, and multihop machine reading comprehension in detail. Finally, we provide a synthesis about the important open problems of this subfield.
Guiding the Growth: Difficulty-Controllable Question Generation through Step-by-Step Rewriting
Yi Cheng
Siyao Li
Ruihui Zhao
Sujian Li
Chenhua Lin
Yefeng Zheng
This paper explores the task of Difficulty-Controllable Question Generation (DCQG), which aims at generating questions with required difficu… (see more)lty levels. Previous research on this task mainly defines the difficulty of a question as whether it can be correctly answered by a Question Answering (QA) system, lacking interpretability and controllability. In our work, we redefine question difficulty as the number of inference steps required to answer it and argue that Question Generation (QG) systems should have stronger control over the logic of generated questions. To this end, we propose a novel framework that progressively increases question difficulty through step-by-step rewriting under the guidance of an extracted reasoning chain. A dataset is automatically constructed to facilitate the research, on which extensive experiments are conducted to test the performance of our method.
Integrating Semantics and Neighborhood Information with Graph-Driven Generative Models for Document Retrieval
Zijing Ou
Qinliang Su
Jianxing Yu
Jingwen Wang
Ruihui Zhao
Changyou Chen
Yefeng Zheng
With the need of fast retrieval speed and small memory footprint, document hashing has been playing a crucial role in large-scale informatio… (see more)n retrieval. To generate high-quality hashing code, both semantics and neighborhood information are crucial. However, most existing methods leverage only one of them or simply combine them via some intuitive criteria, lacking a theoretical principle to guide the integration process. In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model. To deal with the complicated correlations among documents, we further propose a tree-structured approximation method for learning. Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones. Extensive experimental results on three benchmark datasets show that our method achieves superior performance over state-of-the-art methods, demonstrating the effectiveness of the proposed model for simultaneously preserving semantic and neighborhood information.
Semantic and Syntactic Enhanced Aspect Sentiment Triplet Extraction
Zhexue Chen
Hong Huang
Xuanhua Feng Shi
Hai-nan Jin
Aspect Sentiment Triplet Extraction (ASTE) aims to extract triplets from sentences, where each triplet includes an entity, its associated se… (see more)ntiment, and the opinion span explaining the reason for the sentiment. Most existing research addresses this problem in a multi-stage pipeline manner, which neglects the mutual information between such three elements and has the problem of error propagation. In this paper, we propose a Semantic and Syntactic Enhanced aspect Sentiment triplet Extraction model (S3E2) to fully exploit the syntactic and semantic relationships between the triplet elements and jointly extract them. Specifically, we design a Graph-Sequence duel representation and modeling paradigm for the task of ASTE: we represent the semantic and syntactic relationships between word pairs in a sentence by graph and encode it by Graph Neural Networks (GNNs), as well as modeling the original sentence by LSTM to preserve the sequential information. Under this setting, we further apply a more efficient inference strategy for the extraction of triplets. Extensive evaluations on four benchmark datasets show that S3E2 significantly outperforms existing approaches, which proves our S3E2's superiority and flexibility in an end-to-end fashion.
Encoder-Decoder Neural Architecture Optimization for Keyword Spotting
Tong Mo
Enquire One’s Parent and Child Before Decision: Fully Exploit Hierarchical Structure for Self-Supervised Taxonomy Expansion
Suyuchen Wang
Ruihui Zhao
Xi Chen
Yefeng Zheng
Taxonomy is a hierarchically structured knowledge graph that plays a crucial role in machine intelligence. The taxonomy expansion task aims … (see more)to find a position for a new term in an existing taxonomy to capture the emerging knowledge in the world and keep the taxonomy dynamically updated. Previous taxonomy expansion solutions neglect valuable information brought by the hierarchical structure and evaluate the correctness of merely an added edge, which downgrade the problem to node-pair scoring or mini-path classification. In this paper, we propose the Hierarchy Expansion Framework (HEF), which fully exploits the hierarchical structure’s properties to maximize the coherence of expanded taxonomy. HEF makes use of taxonomy’s hierarchical structure in multiple aspects: i) HEF utilizes subtrees containing most relevant nodes as self-supervision data for a complete comparison of parental and sibling relations; ii) HEF adopts a coherence modeling module to evaluate the coherence of a taxonomy’s subtree by integrating hypernymy relation detection and several tree-exclusive features; iii) HEF introduces the Fitting Score for position selection, which explicitly evaluates both path and level selections and takes full advantage of parental relations to interchange information for disambiguation and self-correction. Extensive experiments show that by better exploiting the hierarchical structure and optimizing taxonomy’s coherence, HEF vastly surpasses the prior state-of-the-art on three benchmark datasets by an average improvement of 46.7% in accuracy and 32.3% in mean reciprocal rank.
Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management
Zhengxu Hou
Ruihui Zhao
Zijing Ou
Yafei Liu
Xi Chen
Yefeng Zheng
For task-oriented dialog systems, training a Reinforcement Learning (RL) based Dialog Management module suffers from low sample efficiency a… (see more)nd slow convergence speed due to the sparse rewards in RL. To solve this problem, many strategies have been proposed to give proper rewards when training RL, but their rewards lack interpretability and cannot accurately estimate the distribution of state-action pairs in real dialogs. In this paper, we propose a multi-level reward modeling approach that factorizes a reward into a three-level hierarchy: domain, act, and slot. Based on inverse adversarial reinforcement learning, our designed reward model can provide more accurate and explainable reward signals for state-action pairs. Extensive evaluations show that our approach can be applied to a wide range of reinforcement learning-based dialog systems and significantly improves both the performance and the speed of convergence.
Noised Consistency Training for Text Summarization
J. Liu
Qianren Mao
Hao Peng
Hongdong Zhu
Jianxin Li
Neural abstractive summarization methods often require large quantities of labeled training data. However, labeling large amounts of summari… (see more)zation data is often prohibitive due to time, financial, and expertise constraints, which has limited the usefulness of summarization systems to practical applications. In this paper, we argue that this limitation can be overcome by a semi-supervised approach: consistency training which is to leverage large amounts of unlabeled data to improve the performance of supervised learning over a small corpus. The consistency regularization semi-supervised learning can regularize model predictions to be invariant to small noise applied to input articles. By adding noised unlabeled corpus to help regularize consistency training, this framework obtains comparative performance without using the full dataset. In particular, we have verified that leveraging large amounts of unlabeled data decently improves the performance of supervised learning over an insufficient labeled dataset.
QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications
Mingjun Zhao
Shengli Yan
Xinwang Zhong
Qian Hao
Haolan Chen
Di Niu
Bo Long
Wei-dong Guo
GIANT: Scalable Creation of a Web-scale Ontology
Weidong Guo
Di Niu
Jinwen Luo
Chaoyue Wang
Zhen Wen
Yu Xu