Publications

BanditSum: Extractive Summarization as a Contextual Bandit

Yue Dong

Yikang Shen

Eric Crawford

Herke van Hoof

In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristical… (see more)ly-generated extractive labels. We call our approach BanditSum as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BanditSum is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BanditSum performs significantly better than competing approaches when good summary sentences appear late in the source document.

2018-10-01

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (published)

doi.org

arxiv.org

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Zhilin Yang

Peng Qi

Saizheng Zhang

Yoshua Bengio

William W. Cohen

Russ Salakhutdinov

Christopher D Manning

Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We int… (see more)roduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

2018-10-01

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (published)

doi.org

arxiv.org

A Knowledge Hunting Framework for Common Sense Reasoning

Ali Emami

Noelia De La Cruz

Adam Trischler

Kaheer Suleman

Jackie Cheung

We introduce an automatic system that achieves state-of-the-art results on the Winograd Schema Challenge (WSC), a common sense reasoning tas… (see more)k that requires diverse, complex forms of inference and knowledge. Our method uses a knowledge hunting module to gather text from the web, which serves as evidence for candidate problem resolutions. Given an input problem, our system generates relevant queries to send to a search engine, then extracts and classifies knowledge from the returned results and weighs them to make a resolution. Our approach improves F1 performance on the full WSC by 0.21 over the previous best and represents the first system to exceed 0.5 F1. We further demonstrate that the approach is competitive on the Choice of Plausible Alternatives (COPA) task, which suggests that it is generally applicable.

2018-10-01

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (published)

doi.org

arxiv.org

Deep Graph Infomax

Petar Veličković

William Fedus

William L. Hamilton

Pietro Lio

Yoshua Bengio

(Rex) Devon Hjelm

2018-09-27

ArXiv (preprint)

arxiv.org

Probabilistic Planning with Sequential Monte Carlo methods

Alexandre Piché

Valentin Thomas

Cyril Ibrahim

Yoshua Bengio

Chris Pal

2018-09-27

International Conference on Learning Representations (published)

openreview.net

Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation

Tanya Nair

Doina Precup

Douglas Arnold

Tal Arbel

2018-09-26

Medical Image Computing and Computer Assisted Intervention – MICCAI 2018 (published)

doi.org

arxiv.org

How can deep learning advance computational modeling of sensory information processing?

Jessica A.F. Thompson

Yoshua Bengio

Elia Formisano

Marc Schönwiesner

Deep learning, computational neuroscience, and cognitive science have overlapping goals related to understanding intelligence such that perc… (see more)eption and behaviour can be simulated in computational systems. In neuroimaging, machine learning methods have been used to test computational models of sensory information processing. Recently, these model comparison techniques have been used to evaluate deep neural networks (DNNs) as models of sensory information processing. However, the interpretation of such model evaluations is muddied by imprecise statistical conclusions. Here, we make explicit the types of conclusions that can be drawn from these existing model comparison techniques and how these conclusions change when the model in question is a DNN. We discuss how DNNs are amenable to new model comparison techniques that allow for stronger conclusions to be made about the computational mechanisms underlying sensory information processing.

2018-09-25

ArXiv (preprint)

arxiv.org

On the Learning Dynamics of Deep Neural Networks

Remi Tachet des Combes

Mohammad Pezeshki

Samira Shabanian

Aaron Courville

Yoshua Bengio

While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely… (see more) misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features' frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize \textit{gradient starvation} where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features.

2018-09-18

ArXiv (preprint)

arxiv.org

CNN Prediction of Future Disease Activity for Multiple Sclerosis Patients from Baseline MRI and Lesion Labels

Nazanin Mohammadi Sepahvand

Tal Hassner

Douglas Arnold

Tal Arbel

2018-09-16

BrainLes@MICCAI (published)

doi.org

3D U-Net for Brain Tumour Segmentation

Raghav Mehta

Tal Arbel

2018-09-16

BrainLes@MICCAI (published)

doi.org

How to Exploit Weaknesses in Biomedical Challenge Design and Organization

Annika Reinke

Matthias Eisenmann

Sinan Onogur

Marko Stankovic

Patrick Scholz

Peter M. Full

Hrvoje Bogunovic

Bennett Landman

Oskar Maier

Bjoern Menze

Gregory C. Sharp

Korsuk Sirinukunwattana

Stefanie Speidel

F. V. D. Sommen

Guoyan Zheng

Henning Müller

Michal Kozubek

Tal Arbel

Andrew P. Bradley

Pierre Jannin … (see 2 more)

Annette Kopp-Schneider

Lena Maier-Hein

2018-09-13

Medical Image Computing and Computer Assisted Intervention – MICCAI 2018 (published)

doi.org

RS-Net: Regression-Segmentation 3D CNN for Synthesis of Full Resolution Missing Brain MRI in the Presence of Tumours

Raghav Mehta

Tal Arbel

2018-09-12

Simulation and Synthesis in Medical Imaging (published)

doi.org

arxiv.org

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Publications

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Popular keywords:

Publications