Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Spyridon Bakas
Mauricio Reyes
Andras Jakab
Stefan. Bauer
Markus Rempfler
Alessandro Crimi
Russell T. Shinohara
Christoph Berger
Sung-min Ha
Martin Rozycki
Marcel W. Prastawa
Esther Alberts
Jana Lipková
John Freymann
Justin Kirby
Michel Bilello
Hassan M. Fathallah-Shaykh
Roland Wiest
J. Kirschke
Benedikt Wiestler … (see 31 more)
Rivka R. Colen
Aikaterini Kotrotsou
Pamela LaMontagne
D. Marcus
Mikhail Milchenko
Arash Nazeri
Marc-André Weber
Abhishek Mahajan
Ujjwal Baid
Dongjin Kwon
Manu Agarwal
Mahbubul Alam
Alberto Albiol
A. Albiol
Alex A. Varghese
T. Tuan
Aaron J. Avery
Bobade Pranjal
Subhashis Banerjee
Thomas H. Batchelder
Nematollah Batmanghelich
Enzo Battistella
Martin Bendszus
E. Benson
José Bernal
George Biros
Mariano Cabezas
Siddhartha Chandra
Yi-Ju Chang
et al.
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneo… (see more)us histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumoris a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses thestate-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross tota lresection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
On the Evaluation of Common-Sense Reasoning in Natural Language Understanding
Paul Trichelair
Ali Emami
Adam Trischler
Kaheer Suleman
The NLP and ML communities have long been interested in developing models capable of common-sense reasoning, and recent works have significa… (see more)ntly improved the state of the art on benchmarks like the Winograd Schema Challenge (WSC). Despite these advances, the complexity of tasks designed to test common-sense reasoning remains under-analyzed. In this paper, we make a case study of the Winograd Schema Challenge and, based on two new measures of instance-level complexity, design a protocol that both clarifies and qualifies the results of previous work. Our protocol accounts for the WSC's limited size and variable instance difficulty, properties common to other common-sense benchmarks. Accounting for these properties when assessing model results may prevent unjustified conclusions.
The Hard-CoRe Coreference Corpus: Removing Gender and Number Cues for Difficult Pronominal Anaphora Resolution
Ali Emami
Paul Trichelair
Adam Trischler
Kaheer Suleman
Hannes Schulz
We introduce a new benchmark task for coreference resolution, Hard-CoRe, that targets common-sense reasoning and world knowledge. Previous c… (see more)oreference resolution tasks have been overly vulnerable to systems that simply exploit the number and gender of the antecedents, or have been handcrafted and do not reflect the diversity of sentences in naturally occurring text. With these limitations in mind, we present a resolution task that is both challenging and realistic. We demonstrate that various coreference systems, whether rule-based, feature-rich, graphical, or neural-based, perform at random or slightly above-random on the task, whereas human performance is very strong with high inter-annotator agreement. To explain this performance gap, we show empirically that state-of-the art models often fail to capture context and rely only on the antecedents to make a decision.
The KnowRef Coreference Corpus: Removing Gender and Number Cues for Difficult Pronominal Anaphora Resolution
Ali Emami
Paul Trichelair
Adam Trischler
Kaheer Suleman
Hannes Schulz
We introduce a new benchmark for coreference resolution and NLI, KnowRef, that targets common-sense understanding and world knowledge. Previ… (see more)ous coreference resolution tasks can largely be solved by exploiting the number and gender of the antecedents, or have been handcrafted and do not reflect the diversity of naturally occurring text. We present a corpus of over 8,000 annotated text passages with ambiguous pronominal anaphora. These instances are both challenging and realistic. We show that various coreference systems, whether rule-based, feature-rich, or neural, perform significantly worse on the task than humans, who display high inter-annotator agreement. To explain this performance gap, we show empirically that state-of-the art models often fail to capture context, instead relying on the gender or number of candidate antecedents to make a decision. We then use problem-specific insights to propose a data-augmentation trick called antecedent switching to alleviate this tendency in models. Finally, we show that antecedent switching yields promising results on other tasks as well: we use it to achieve state-of-the-art results on the GAP coreference task.
Sim-to-Real Transfer with Neural-Augmented Robot Simulation
Florian Golemo
Adrien Ali Taiga
Pierre-Yves Oudeyer
Despite the recent successes of deep reinforcement learning, teaching complex motor skills to a physical robot remains a hard problem. While… (see more) learning directly on a real system is usually impractical, doing so in simulation has proven to be fast and safe. Nevertheless, because of the "reality gap," policies trained in simulation often perform poorly when deployed on a real system. In this work, we introduce a method for training a recurrent neural network on the differences between simulated and real robot trajectories and then using this model to augment the simulator. This Neural-Augmented Simulation (NAS) can be used to learn control policies that transfer significantly better to real environments than policies learned on existing simulators. We demonstrate the potential of our approach through a set of experiments on the Mujoco simulator with added backlash and the Poppy Ergo Jr robot. NAS allows us to learn policies that are competitive with ones that would have been learned directly on the real robot.
BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop
Maxime Chevalier-Boisvert
Salem Lahlou
Lucas Willems
Chitwan Saharia
Thien Huu Nguyen
Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific … (see more)reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded language learning. The BabyAI platform comprises an extensible suite of 19 levels of increasing difficulty. The levels gradually lead the agent towards acquiring a combinatorially rich synthetic language which is a proper subset of English. The platform also provides a heuristic expert agent for the purpose of simulating a human teacher. We report baseline results and estimate the amount of human involvement that would be required to train a neural network-based agent on some of the BabyAI levels. We put forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties.
Deep Learning. Das umfassende Handbuch
Visual Reasoning with Multi-hop Feature Modulation
Florian Strub
Mathieu Seurin
Ethan Perez
Harm de Vries
Jérémie Mary
P. Preux
Olivier Pietquin
Automatic differentiation in ML: Where we are and where we should be going
Bart van Merriënboer
Olivier Breuleux
Arnaud Bergeron
Pascal Lamblin
We review the current state of automatic differentiation (AD) for array programming in machine learning (ML), including the different approa… (see more)ches such as operator overloading (OO) and source transformation (ST) used for AD, graph-based intermediate representations for programs, and source languages. Based on these insights, we introduce a new graph-based intermediate representation (IR) which specifically aims to efficiently support fully-general AD for array programming. Unlike existing dataflow programming representations in ML frameworks, our IR naturally supports function calls, higher-order functions and recursion, making ML models easier to implement. The ability to represent closures allows us to perform AD using ST without a tape, making the resulting derivative (adjoint) program amenable to ahead-of-time optimization using tools from functional language compilers, and enabling higher-order derivatives. Lastly, we introduce a proof of concept compiler toolchain called Myia which uses a subset of Python as a front end.
BanditSum: Extractive Summarization as a Contextual Bandit
Yue Dong
Yikang Shen
Eric Crawford
Herke van Hoof
In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristical… (see more)ly-generated extractive labels. We call our approach BanditSum as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BanditSum is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BanditSum performs significantly better than competing approaches when good summary sentences appear late in the source document.
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
William W. Cohen
Russ Salakhutdinov
Christopher D Manning
Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We int… (see more)roduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.
A Knowledge Hunting Framework for Common Sense Reasoning
Ali Emami
Noelia De La Cruz
Adam Trischler
Kaheer Suleman
We introduce an automatic system that achieves state-of-the-art results on the Winograd Schema Challenge (WSC), a common sense reasoning tas… (see more)k that requires diverse, complex forms of inference and knowledge. Our method uses a knowledge hunting module to gather text from the web, which serves as evidence for candidate problem resolutions. Given an input problem, our system generates relevant queries to send to a search engine, then extracts and classifies knowledge from the returned results and weighs them to make a resolution. Our approach improves F1 performance on the full WSC by 0.21 over the previous best and represents the first system to exceed 0.5 F1. We further demonstrate that the approach is competitive on the Choice of Plausible Alternatives (COPA) task, which suggests that it is generally applicable.