Portrait de Bang Liu

Bang Liu

Membre académique associé
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage profond
Apprentissage sur graphes
Exploration des données
Modèles génératifs
Traitement du langage naturel

Biographie

Bang Liu est professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Il est membre du Laboratoire de recherche appliquée en linguistique informatique (RALI) du DIRO, membre associé de Mila – Institut québécois d'intelligence artificielle, et titulaire d'une chaire en IA Canada-CIFAR.

Il a obtenu un baccalauréat en ingénierie de l'Université des sciences et technologies de Chine (USTC) en 2013, ainsi qu’une maîtrise ès sciences et un doctorat de l'Université de l'Alberta en 2015 et en 2020, respectivement. Ses recherches portent principalement sur le traitement du langage naturel, l'apprentissage multimodal et incarné, la théorie et les techniques de l'intelligence artificielle (par exemple, la compréhension et l'amélioration de grands modèles de langage) et l'intelligence artificielle pour la science (par exemple, la santé, la science des matériaux et la radiologie).

Étudiants actuels

Visiteur de recherche indépendant - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Maîtrise recherche - UdeM
Stagiaire de recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Maîtrise recherche - UdeM

Publications

Story Forest
Fred X. Han
Di Niu
Linglong Kong
Kunfeng Lai
Yu Xu
Extracting events accurately from vast news corpora and organize events logically is critical for news apps and search engines, which aim to… (voir plus) organize news information collected from the Internet and present it to users in the most sensible forms. Intuitively speaking, an event is a group of news documents that report the same news incident possibly in different ways. In this article, we describe our experience of implementing a news content organization system at Tencent to discover events from vast streams of breaking news and to evolve news story structures in an online fashion. Our real-world system faces unique challenges in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we (1) need to accurately and quickly extract distinguishable events from massive streams of long text documents, and (2) must develop the structures of event stories in an online manner, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest, a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. A core novelty of our Story Forest system is EventX, a semi-supervised scheme to extract events from massive Internet news corpora. EventX relies on a two-layered, graph-based clustering procedure to group documents into fine-grained events. We conducted extensive evaluations based on (1) 60 GB of real-world Chinese news data, (2) a large Chinese Internet news dataset that contains 11,748 news articles with truth event labels, and (3) the 20 News Groups English dataset, through detailed pilot user experience studies. The results demonstrate the superior capabilities of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers.
Asking Questions the Human Way: Scalable Question-Answer Generation from Text Corpus
Haojie Wei
Di Niu
Haolan Chen
Yancheng He
The ability to ask questions is important in both human and machine intelligence. Learning to ask questions helps knowledge acquisition, imp… (voir plus)roves question-answering and machine reading comprehension tasks, and helps a chatbot to keep the conversation flowing with a human. Existing question generation models are ineffective at generating a large amount of high-quality question-answer pairs from unstructured text, since given an answer and an input passage, question generation is inherently a one-to-many mapping. In this paper, we propose Answer-Clue-Style-aware Question Generation (ACS-QG), which aims at automatically generating high-quality and diverse question-answer pairs from unlabeled text corpus at scale by imitating the way a human asks questions. Our system consists of: i) an information extractor, which samples from the text multiple types of assistive information to guide question generation; ii) neural question generators, which generate diverse and controllable questions, leveraging the extracted assistive information; and iii) a neural quality controller, which removes low-quality generated data based on text entailment. We compare our question generation models with existing approaches and resort to voluntary human evaluation to assess the quality of the generated question-answer pairs. The evaluation results suggest that our system dramatically outperforms state-of-the-art neural question generation models in terms of the generation quality, while being scalable in the meantime. With models trained on a relatively smaller amount of data, we can generate 2.8 million quality-assured question-answer pairs from a million sentences found in Wikipedia.
Natural Language Processing and Text Mining with Graph-Structured Representations