Yue Dong

Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization

Meng Cao

Jackie Chi Kit Cheung

State-of-the-art abstractive summarization systems often generate hallucinations; i.e., content that is not directly inferable from the sour… (see more)ce text. Despite being assumed to be incorrect, we find that much hallucinated content is actually consistent with world knowledge, which we call factual hallucinations. Including these factual hallucinations in a summary can be beneficial because they provide useful background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical results suggest that our method vastly outperforms two baselines in both accuracy and F1 scores and has a strong correlation with human judgments on factuality classification tasks.Furthermore, we use our method as a reward signal to train a summarization system using an off-line reinforcement learning (RL) algorithm that can significantly improve the factuality of generated summaries while maintaining the level of abstractiveness.

2022-04-30

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

doi.org

arxiv.org

Learning with Rejection for Abstractive Text Summarization

Meng Cao

Yue Dong

Jingyi He

Jackie CK Cheung

2021-12-31

EMNLP (published)

doi.org

arxiv.org

On-the-Fly Attention Modulation for Neural Generation

Yue Dong

Chandra Bhagavatula

Ximing Lu

Jena D. Hwang

Antoine Bosselut

Jackie CK Cheung

Yejin Choi

Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from degeneration: the genera… (see more)ted text is repetitive, generic, self-contradictory, and often lacks commonsense. Our analyses on sentence-level attention patterns in LMs reveal that neural degeneration may be associated with insufficient learning of task-specific characteristics by the attention mechanism. This finding motivates on-the-fly attention modulation -- a simple but effective method that enables the injection of priors into attention computation during inference. Automatic and human evaluation results on three text generation benchmarks demonstrate that attention modulation helps LMs generate text with enhanced fluency, creativity, and commonsense reasoning, in addition to significantly reduce sentence-level repetition.

2021-07-31

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (published)

doi.org

arxiv.org

Discourse-Aware Unsupervised Summarization for Long Scientific Documents

Yue Dong

Andrei Mircea

Jackie CK Cheung

2021-03-31

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (published)

doi.org

Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization

Meng Cao

Yue Dong

Jackie CK Cheung

State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (see more)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneﬁcial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and ﬁnetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classiﬁcation tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or ﬁne-tuning step. 1

2020-12-31

arXiv.org (preprint)

dblp.uni-trier.de

On-the-Fly Attention Modularization for Neural Generation

Yue Dong

Chandra Bhagavatula

Ximing Lu

Jena D. Hwang

Antoine Bosselut

Jackie CK Cheung

Yejin Choi

Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from de generation: generated… (see more) text is repetitive, generic, self-inconsistent, and lacking commonsense. The empirical analyses on sentence-level attention patterns reveal that neural text degeneration may be associated with insufﬁcient learning of inductive biases by the attention mechanism. Our ﬁndings motivate on-the-ﬂy attention modularization, a simple but effective method for injecting inductive biases into attention computation during inference. The resulting text produced by the language model with attention modularization can yield enhanced diversity and commonsense reasoning while maintaining ﬂuency and coherence.

2020-12-31

arXiv.org (preprint)

dblp.uni-trier.de

Factual Error Correction for Abstractive Summarization Models

Meng Cao

Yue Dong

Jiapeng Wu

Jackie Chi Kit Cheung

Neural abstractive summarization systems have achieved promising progress, thanks to the availability of large-scale datasets and models pre… (see more)-trained with self-supervised methods. However, ensuring the factual consistency of the generated summaries for abstractive summarization systems is a challenge. We propose a post-editing corrector module to address this issue by identifying and correcting factual errors in generated summaries. The neural corrector model is pre-trained on artificial examples that are created by applying a series of heuristic transformations on reference summaries. These transformations are inspired by an error analysis of state-of-the-art summarization model outputs. Experimental results show that our model is able to correct factual errors in summaries generated by other neural summarization models and outperforms previous models on factual consistency evaluation on the CNN/DailyMail dataset. We also find that transferring from artificial error correction to downstream settings is still very challenging.

2020-10-31

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)

doi.org

arxiv.org

Multi-Fact Correction in Abstractive Text Summarization

Yue Dong

Shuohang Wang

Zhe Gan

Yu Cheng

Jackie CK Cheung

Jingjing Liu

2020-10-31

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)

doi.org

arxiv.org

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles

Yao Lu

Yue Dong

Laurent Charlin

Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-sc… (see more)ale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results—using several state-of-the-art models trained on the Multi-XScience dataset—reveal that Multi-XScience is well suited for abstractive models.

2020-10-31

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)

doi.org

arxiv.org

HipoRank: Incorporating Hierarchical and Positional Information into Graph-based Unsupervised Long Document Extractive Summarization

Yue Dong

Andrei Mircea

Jackie CK Cheung

We propose a novel graph-based ranking model for unsupervised extractive summarization of long documents. Graph-based ranking models typical… (see more)ly represent documents as undirected fully-connected graphs, where a node is a sentence, an edge is weighted based on sentence-pair similarity, and sentence importance is measured via node centrality. Our method leverages positional and hierarchical information grounded in discourse structure to augment a document's graph representation with hierarchy and directionality. Experimental results on PubMed and arXiv datasets show that our approach outperforms strong unsupervised baselines by wide margins and performs comparably to some of the state-of-the-art supervised models that are trained on hundreds of thousands of examples. In addition, we find that our method provides comparable improvements with various distributional sentence representations; including BERT and RoBERTa models fine-tuned on sentence similarity.

2020-04-30

ArXiv (preprint)

arxiv.org

Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses

Matt Grenander

Yue Dong

Jackie CK Cheung

Annie Priyadarshini Louis

Sentence position is a strong feature for news summarization, since the lead often (but not always) summarizes the key points of the article… (see more). In this paper, we show that recent neural systems excessively exploit this trend, which although powerful for many inputs, is also detrimental when summarizing documents where important content should be extracted from later parts of the article. We propose two techniques to make systems sensitive to the importance of content in different parts of the article. The first technique employs ‘unbiased’ data; i.e., randomly shuffled sentences of the source document, to pretrain the model. The second technique uses an auxiliary ROUGE-based loss that encourages the model to distribute importance scores throughout a document by mimicking sentence-level ROUGE scores on the training data. We show that these techniques significantly improve the performance of a competitive reinforcement learning based extractive system, with the auxiliary loss being more powerful than pretraining.

2019-10-31

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (published)

doi.org

arxiv.org

Learning Multi-Task Communication with Message Passing for Sequence Learning

Pengfei Liu

Jie Fu

Yue Dong

Xipeng Qiu

Jackie CK Cheung

We present two architectures for multi-task learning with neural sequence models. Our approach allows the relationships between different ta… (see more)sks to be learned dynamically, rather than using an ad-hoc pre-defined structure as in previous work. We adopt the idea from message-passing graph neural networks, and propose a general graph multi-task learning framework in which different tasks can communicate with each other in an effective and interpretable way. We conduct extensive experiments in text classification and sequence labelling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines, but also learn interpretable and transferable patterns across tasks.

2019-07-16

Proceedings of the AAAI Conference on Artificial Intelligence (published)

doi.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Yue Dong

Publications