Bang Liu

Junming Shao

2023-01-01

ACL (1) (publié)

QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance

Xiaoqiang Wang

Siliang Tang

Lingfei Wu

Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input cont… (voir plus)ext of generation, rendering the lack of deep understanding of the relevance between the generated questions and input contexts. As a result, they may wrongly penalize a legitimate and reasonable candidate question when it (1) involves complicated reasoning with the context or (2) can be grounded by multiple evidences in the context.In this paper, we propose QRelScore, a context-aware Relevance evaluation metric for Question Generation.Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation to cope with the complicated reasoning and diverse generation from multiple evidences, respectively.Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.

2022-12-01

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (publié)

Better Modeling the Programming World with Code Concept Graphs-augmented Multi-modal Learning

Martin Weyssow

Houari Sahraoui

The progress made in code modeling has been tremendous in recent years thanks to the design of natural language processing learning approach… (voir plus)es based on state-of-the-art model architectures. Nevertheless, we believe that the current state-of-the-art does not focus enough on the full potential that data may bring to a learning process in software engineering. Our vision articulates on the idea of leveraging multi-modal learning approaches to modeling the programming world. In this paper, we investigate one of the underlying idea of our vision whose objective based on concept graphs of identifiers aims at leveraging high-level relationships between domain concepts manipulated through particular language constructs. In particular, we propose to enhance an existing pretrained language model of code by joint-learning it with a graph neural network based on our concept graphs. We conducted a preliminary evaluation that shows gain of effectiveness of the models for code search using a simple joint-learning method and prompts us to further investigate our research vision.

2022-05-22

2022 IEEE/ACM 44th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER) (publié)

Grow-and-Clip: Informative-yet-Concise Evidence Distillation for Answer Explanation

Yuyan Chen

Yanghua Xiao

Interpreting the predictions of existing Question Answering (QA) models is critical to many real-world intelligent applications, such as QA … (voir plus)systems for healthcare, education, and finance. However, existing QA models lack interpretability and provide no feedback or explanation for end-users to help them understand why a specific prediction is the answer to a question. In this research, we argue that the evidences of an answer is critical to enhancing the interpretability of QA models. Unlike previous research that simply extracts several sentence(s) in the context as evidence, we are the first to explicitly define the concept of evidence as the supporting facts in a context which are informative, concise, and readable. Besides, we provide effective strategies to quantitatively measure the informativeness, conciseness and readability of evidence. Furthermore, we propose Grow-and-Clip Evidence Distillation (GCED) algorithm to extract evidences from the contexts by trade-off informativeness, conciseness, and readability. We conduct extensive experiments on the SQuAD and TriviaQA datasets with several baseline models to evaluate the effect of GCED on interpreting answers to questions. Human evaluation are also carried out to check the quality of distilled evidences. Experimental results show that automatic distilled evidences have human-like informativeness, conciseness and readability, which can enhance the interpretability of the answers to questions.

2022-05-09

2022 IEEE 38th International Conference on Data Engineering (ICDE) (publié)

Tell Me How to Survey: Literature Review Made Simple with Automatic Reading Path Generation

Jiayuan Ding

Tong Xiang

Zijing Ou

Wangyang Zuo

Ruihui Zhao

Chenhua Lin

Yefeng Zheng

Recent years have witnessed the dramatic growth of paper volumes with plenty of new research papers published every day, especially in the a… (voir plus)rea of computer science. How to glean papers worth reading from the massive literature to do a quick survey or keep up with the latest advancement about a specific research topic has become a challenging task. Existing academic search engines return relevant papers by individually calculating the relevance between each paper and query. However, such systems usually omit the prerequisite chains of a research topic and cannot form a meaningful reading path. In this paper, we introduce a new task named Reading Path Generation (RPG) which aims at automatically producing a path of papers to read for a given query. To serve as a research benchmark, we further propose SurveyBank, a dataset consisting of large quantities of survey papers in the field of computer science as well as their citation relationships. Furthermore, we propose a graph-optimization-based approach for reading path generation which takes the relationship between papers into account. Extensive evaluations demonstrate that our approach outperforms other baselines. A real-time Reading Path Generation (RePaGer) system has been also implemented with our designed model. Our source code and SurveyBank dataset can be found here11https://github.com/JiayuanDing100/Reading-Path-Generation.

2022-05-09

2022 IEEE 38th International Conference on Data Engineering (ICDE) (publié)

Accepted Tutorials at The Web Conference 2022

Riccardo Tommasini

Senjuti Basu Roy

Xuan Wang

Hongwei Wang

Heng Ji

Jiawei Han

Preslav Nakov

Giovanni Da San Martino

Firoj Alam

Markus Schedl

Elisabeth Lex

Akash Bharadwaj

Graham Cormode

Milan Dojchinovski

Jan Forberg

Johannes Frey

Pieter Bonte

Marco Balduini

Matteo Belcao

Emanuele Della Valle … (voir 53 de plus)

Junliang Yu

Hongzhi Yin

Tong Chen

Haochen Liu

Yiqi Wang

Wenqi Fan

Xiaorui Liu

Jamell Dacon

Lingjuan Lye

Jiliang Tang

Aristides Gionis

Stefan Neumann

Bruno Ordozgoiti

Simon Razniewski

Hiba Arnaout

Shrestha Ghosh

Fabian Suchanek

Lingfei Wu

Yu Chen

Yunyao Li

Filip Ilievski

Daniel Garijo

Hans Chalupsky

Pedro Szekely

Ilias Kanellos

Dimitris Sacharidis

Thanasis Vergoulis

Nurendra Choudhary

Nikhil Rao

Karthik Subbian

Srinivasan Sengamedu

Chandan K. Reddy

Friedhelm Victor

Bernhard Haslhofer

George Katsogiannis- Meimarakis

Georgia Koutrika

Shengmin Jin

Danai Koutra

Reza Zafarani

Yulia Tsvetkov

Vidhisha Balachandran

Sachin Kumar

Xiangyu Zhao

Bo Chen

Huifeng Guo

Yejing Wang

Ruiming Tang

Yang Zhang

Wenjie Wang

Peng Wu

Fuli Feng

Xiangnan He

This paper summarizes the content of the 20 tutorials that have been given at The Web Conference 2022: 85% of these tutorials are lecture st… (voir plus)yle, and 15% of these are hands on.

2022-04-25

The Web Conference (publié)

QEN: Applicable Taxonomy Completion via Evaluating Full Taxonomic Relations

Suyuchen Wang

Ruihui Zhao

Yefeng Zheng

Taxonomy is a fundamental type of knowledge graph for a wide range of web applications like searching and recommendation systems. To keep a … (voir plus)taxonomy automatically updated with the latest concepts, the taxonomy completion task matches a pair of proper hypernym and hyponym in the original taxonomy with the new concept as its parent and child. Previous solutions utilize term embeddings as input and only evaluate the parent-child relations between the new concept and the hypernym-hyponym pair. Such methods ignore the important sibling relations, and are not applicable in reality since term embeddings are not available for the latest concepts. They also suffer from the relational noise of the “pseudo-leaf” node, which is a null node acting as a node’s hyponym to enable the new concept to be a leaf node. To tackle the above drawbacks, we propose the Quadruple Evaluation Network (QEN), a novel taxonomy completion framework that utilizes easily accessible term descriptions as input, and applies pretrained language model and code attention for accurate inference while reducing online computation. QEN evaluates both parent-child and sibling relations to both enhance the accuracy and reduce the noise brought by pseudo-leaf. Extensive experiments on three real-world datasets in different domains with different sizes and term description sources prove the effectiveness and robustness of QEN on overall performance and especially the performance for adding non-leaf nodes, which largely surpasses previous methods and achieves the new state-of-the-art of the task.1

2022-04-25

The Web Conference (publié)

QEN: Applicable Taxonomy Completion via Evaluating Full Taxonomic Relations

Suyuchen Wang

Ruihui Zhao

Yefeng Zheng

Taxonomy is a fundamental type of knowledge graph for a wide range of web applications like searching and recommendation systems. To keep a … (voir plus)taxonomy automatically updated with the latest concepts, the taxonomy completion task matches a pair of proper hypernym and hyponym in the original taxonomy with the new concept as its parent and child. Previous solutions utilize term embeddings as input and only evaluate the parent-child relations between the new concept and the hypernym-hyponym pair. Such methods ignore the important sibling relations, and are not applicable in reality since term embeddings are not available for the latest concepts. They also suffer from the relational noise of the “pseudo-leaf” node, which is a null node acting as a node’s hyponym to enable the new concept to be a leaf node. To tackle the above drawbacks, we propose the Quadruple Evaluation Network (QEN), a novel taxonomy completion framework that utilizes easily accessible term descriptions as input, and applies pretrained language model and code attention for accurate inference while reducing online computation. QEN evaluates both parent-child and sibling relations to both enhance the accuracy and reduce the noise brought by pseudo-leaf. Extensive experiments on three real-world datasets in different domains with different sizes and term description sources prove the effectiveness and robustness of QEN on overall performance and especially the performance for adding non-leaf nodes, which largely surpasses previous methods and achieves the new state-of-the-art of the task.1

2022-04-25

The Web Conference (published)

Learning What You Need from What You Did: Product Taxonomy Expansion with User Behaviors Supervision

Sijie Cheng

Zhouhong Gu

Rui Xie

Wei Wu

Yanghua Xiao

Taxonomies have been widely used in various domains to underpin numerous applications. Specially, product taxonomies serve an essential role… (voir plus) in the e-commerce domain for the recommendation, browsing, and query understanding. However, taxonomies need to constantly capture the newly emerged terms or concepts in e-commerce platforms to keep up-to-date, which is expensive and labor-intensive if it relies on manual maintenance and updates. Therefore, we target the taxonomy expansion task to attach new concepts to existing taxonomies automatically. In this paper, we present a self-supervised and user behavior-oriented product taxonomy expansion framework to append new concepts into existing taxonomies. Our framework extracts hyponymy relations that conform to users' intentions and cognition. Specifically, i) to fully exploit user behavioral information, we extract candidate hyponymy relations that match user interests from query-click concepts; ii) to enhance the semantic information of new concepts and better detect hyponymy relations, we model concepts and relations through both user-generated content and structural information in existing taxonomies and user click logs, by leveraging Pre-trained Language Models and Graph Neural Network combined with Contrastive Learning; iii) to reduce the cost of dataset construction and overcome data skews, we construct a high-quality and balanced training dataset from existing taxonomy with no supervision. Extensive experiments on real-world product taxonomies in Meituan Platform, a leading Chinese vertical e-commerce platform to order take-out with more than 70 million daily active users, demonstrate the superiority of our proposed framework over state-of-the-art methods. Notably, our method enlarges the size of real-world product taxonomies from 39,263 to 94,698 relations with 88% precision. Our implementation is available: https://github.com/AdaCheng/Product_Taxonomy_Expansion.

2022-03-28

ArXiv (preprint)

R5: Rule Discovery with Reinforced and Recurrent Relational Reasoning

Shengyao Lu

Keith G Mills

SHANGLING JUI

Di Niu

Systematicity, i.e., the ability to recombine known parts and rules to form new sequences while reasoning over relational data, is critical … (voir plus)to machine intelligence. A model with strong systematicity is able to train on small-scale tasks and generalize to large-scale tasks. In this paper, we propose R5, a relational reasoning framework based on reinforcement learning that reasons over relational graph data and explicitly mines underlying compositional logical rules from observations. R5 has strong systematicity and being robust to noisy data. It consists of a policy value network equipped with Monte Carlo Tree Search to perform recurrent relational prediction and a backtrack rewriting mechanism for rule mining. By alternately applying the two components, R5 progressively learns a set of explicit rules from data and performs explainable and generalizable relation prediction. We conduct extensive evaluations on multiple datasets. Experimental results show that R5 outperforms various embedding-based and rule induction baselines on relation prediction tasks while achieving a high recall rate in discovering ground truth rules.

2022-01-28

ICLR.cc/2022/Conference (spotlight)

openreview.net

Grow-and-Clip: Informative-yet-Concise Evidence Distillation for Answer Explanation

Yuyan Chen

Yanghua Xiao

Interpreting the predictions of existing Question Answering (QA) models is critical to many real-world intelligent applications, such as QA … (voir plus)systems for healthcare, education, and finance. However, existing QA models lack interpretability and provide no feedback or explanation for end-users to help them understand why a specific prediction is the answer to a question. In this research, we argue that the evidences of an answer is critical to enhancing the interpretability of QA models. Unlike previous research that simply extracts several sentence(s) in the context as evidence, we are the first to explicitly define the concept of evidence as the supporting facts in a context which are informative, concise, and readable. Besides, we provide effective strategies to quantitatively measure the informativeness, conciseness and readability of evidence. Furthermore, we propose Grow-and-Clip Evidence Distillation (GCED) algorithm to extract evidences from the contexts by trade-off informativeness, conciseness, and readability. We conduct extensive experiments on the SQuAD and TriviaQA datasets with several baseline models to evaluate the effect of GCED on interpreting answers to questions. Human evaluation are also carried out to check the quality of distilled evidences. Experimental results show that automatic distilled evidences have human-like informativeness, conciseness and readability, which can enhance the interpretability of the answers to questions.

2022-01-13

ArXiv (preprint)

Feeding What You Need by Understanding What You Learned

Xiaoqiang Wang

Fangli Xu

Bo Long

Siliang Tang

Lingfei Wu

2022-01-01

ACL (1) (publié)