BanditSum: Extractive Summarization as a Contextual Bandit
Yue Dong
Yikang Shen
Eric Crawford
Herke van Hoof
In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristical… (see more)ly-generated extractive labels. We call our approach BanditSum as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BanditSum is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BanditSum performs significantly better than competing approaches when good summary sentences appear late in the source document.
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
William W. Cohen
Russ Salakhutdinov
Christopher D Manning
Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We int… (see more)roduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.
A Knowledge Hunting Framework for Common Sense Reasoning
Ali Emami
Noelia De La Cruz
Adam Trischler
Kaheer Suleman
We introduce an automatic system that achieves state-of-the-art results on the Winograd Schema Challenge (WSC), a common sense reasoning tas… (see more)k that requires diverse, complex forms of inference and knowledge. Our method uses a knowledge hunting module to gather text from the web, which serves as evidence for candidate problem resolutions. Given an input problem, our system generates relevant queries to send to a search engine, then extracts and classifies knowledge from the returned results and weighs them to make a resolution. Our approach improves F1 performance on the full WSC by 0.21 over the previous best and represents the first system to exceed 0.5 F1. We further demonstrate that the approach is competitive on the Choice of Plausible Alternatives (COPA) task, which suggests that it is generally applicable.
Deep Graph Infomax
Petar Veličković
William Fedus
William L. Hamilton
Pietro Lio
Probabilistic Planning with Sequential Monte Carlo methods
Alexandre Piché
Valentin Thomas
Cyril Ibrahim
Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation
Tanya Nair
Douglas Arnold
How can deep learning advance computational modeling of sensory information processing?
Jessica A.F. Thompson
Elia Formisano
Marc Schönwiesner
Deep learning, computational neuroscience, and cognitive science have overlapping goals related to understanding intelligence such that perc… (see more)eption and behaviour can be simulated in computational systems. In neuroimaging, machine learning methods have been used to test computational models of sensory information processing. Recently, these model comparison techniques have been used to evaluate deep neural networks (DNNs) as models of sensory information processing. However, the interpretation of such model evaluations is muddied by imprecise statistical conclusions. Here, we make explicit the types of conclusions that can be drawn from these existing model comparison techniques and how these conclusions change when the model in question is a DNN. We discuss how DNNs are amenable to new model comparison techniques that allow for stronger conclusions to be made about the computational mechanisms underlying sensory information processing.
On the Learning Dynamics of Deep Neural Networks
Remi Tachet des Combes
Mohammad Pezeshki
Samira Shabanian
While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely… (see more) misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features' frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize \textit{gradient starvation} where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features.
CNN Prediction of Future Disease Activity for Multiple Sclerosis Patients from Baseline MRI and Lesion Labels
Nazanin Mohammadi Sepahvand
Tal Hassner
Douglas Arnold
3D U-Net for Brain Tumour Segmentation
Raghav Mehta
How to Exploit Weaknesses in Biomedical Challenge Design and Organization
Annika Reinke
Matthias Eisenmann
Sinan Onogur
Marko Stankovic
Patrick Scholz
Peter M. Full
Hrvoje Bogunovic
Bennett Landman
Oskar Maier
Bjoern Menze
Gregory C. Sharp
Korsuk Sirinukunwattana
Stefanie Speidel
F. V. D. Sommen
Guoyan Zheng
Henning Müller
Michal Kozubek
Andrew P. Bradley
Pierre Jannin … (see 2 more)
Annette Kopp-Schneider
Lena Maier-Hein
RS-Net: Regression-Segmentation 3D CNN for Synthesis of Full Resolution Missing Brain MRI in the Presence of Tumours
Raghav Mehta