Publications

Global Surveillance of COVID-19 by mining news media using a multi-source dynamic embedded topic model

Yue Li

Pratheeksha Nair

Zhi Wen

Imane Chafi

Anya Okhmatovskaia

Guido Powell

Yannan Shen

David Buckeridge

2020-11-10

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (published)

doi.org

On Posterior Collapse and Encoder Feature Dispersion in Sequence VAEs.

Teng Long

Yanshuai Cao

Jackie Cheung

Variational autoencoders (VAEs) hold great potential for modelling text, as they could in theory separate high-level semantic and syntactic … (see more)properties from local regularities of natural language. Practically, however, VAEs with autoregressive decoders often suffer from posterior collapse, a phenomenon where the model learns to ignore the latent variables, causing the sequence VAE to degenerate into a language model. In this paper, we argue that posterior collapse is in part caused by the lack of dispersion in encoder features. We provide empirical evidence to verify this hypothesis, and propose a straightforward fix using pooling. This simple technique effectively prevents posterior collapse, allowing model to achieve significantly better data log-likelihood than standard sequence VAEs. Comparing to existing work, our proposed method is able to achieve comparable or superior performances while being more computationally efficient.

2020-11-10

(published)

www.semanticscholar.org

Approximate Planning and Learning for Partially Observed Systems

Aditya Mahajan

2020-11-09

International Conference of Control, Dynamic Systems, and Robotics (published)

doi.org

Effectiveness of quarantine and testing to prevent COVID-19 transmission from arriving travelers

Russell Wa

David Buckeridge

2020-11-04

medRxiv (preprint)

doi.org

Explainability and Interpretability: Keys to Deep Medicine

Arash Shaban-Nejad

Martin Michalowski

David Buckeridge

2020-11-03

Explainable AI in Healthcare and Medicine (published)

doi.org

Bisimulation metrics and norms for real-weighted automata

Borja Balle

Pascale Gourdeau

Prakash Panangaden

2020-11-01

Information and Computation (published)

doi.org

ComplexDataLab at W-NUT 2020 Task 2: Detecting Informative COVID-19 Tweets by Attending over Linked Documents

Kellin Pelrine

Jacob Danovitch

Albert Orozco Camacho

Reihaneh Rabbany

Given the global scale of COVID-19 and the flood of social media content related to it, how can we find informative discussions? We present … (see more)Gapformer, which effectively classifies content as informative or not. It reformulates the problem as graph classification, drawing on not only the tweet but connected webpages and entities. We leverage a pre-trained language model as well as the connections between nodes to learn a pooled representation for each document network. We show it outperforms several competitive baselines and present ablation studies supporting the benefit of the linked information. Code is available on Github.

2020-11-01

WNUT (published)

doi.org

Deconstructing Word Embedding Algorithms

Kian Kenyon-Dean

Edward Daniel Newell

Jackie Cheung

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)

doi.org

arxiv.org

Factual Error Correction for Abstractive Summarization Models

Meng Cao

Yue Dong

Jiapeng Wu

Jackie Cheung

Neural abstractive summarization systems have achieved promising progress, thanks to the availability of large-scale datasets and models pre… (see more)-trained with self-supervised methods. However, ensuring the factual consistency of the generated summaries for abstractive summarization systems is a challenge. We propose a post-editing corrector module to address this issue by identifying and correcting factual errors in generated summaries. The neural corrector model is pre-trained on artificial examples that are created by applying a series of heuristic transformations on reference summaries. These transformations are inspired by an error analysis of state-of-the-art summarization model outputs. Experimental results show that our model is able to correct factual errors in summaries generated by other neural summarization models and outperforms previous models on factual consistency evaluation on the CNN/DailyMail dataset. We also find that transferring from artificial error correction to downstream settings is still very challenging.

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)

doi.org

arxiv.org

MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining

Zhi Wen

Xing Han Lu

Siva Reddy

2020-11-01

Proceedings of the 3rd Clinical Natural Language Processing Workshop (published)

doi.org

arxiv.org

Multi-Fact Correction in Abstractive Text Summarization

Yue Dong

Shuohang Wang

Zhe Gan

Yu Cheng

Jackie Cheung

Jingjing Liu

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)

doi.org

arxiv.org

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles

Yao Lu

Yue Dong

Laurent Charlin

Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-sc… (see more)ale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results—using several state-of-the-art models trained on the Multi-XScience dataset—reveal that Multi-XScience is well suited for abstractive models.

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)

doi.org

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications