Publications

Predicting Future Disease Activity and Treatment Responders for Multiple Sclerosis Patients Using a Bag-of-Lesions Brain Representation

Andrew Doyle

Doina Precup

Douglas Arnold

Tal Arbel

2017-09-04

Medical Image Computing and Computer Assisted Intervention − MICCAI 2017 (published)

doi.org

Graphs in Biomedical Image Analysis, Computational Anatomy and Imaging Genetics

M. Cardoso

Tal Arbel

Enzo Ferrante

Xavier Pennec

Adrian Dalca

Sarah Parisot

S. Joshi

Nematollah Batmanghelich

Aristeidis Sotiras

Mads Lenstrup Nielsen

M. Sabuncu

Tom Fletcher

Li Shen

Stanley Durrleman

Stefan H. Sommer

2017-09-01

Lecture Notes in Computer Science (published)

doi.org

World Knowledge for Reading Comprehension: Rare Entity Prediction with Hierarchical LSTMs Using External Descriptions

Teng Long

Emmanuel Bengio

Ryan Lowe

Jackie Cheung

Doina Precup

Humans interpret texts with respect to some background information, or world knowledge, and we would like to develop automatic reading compr… (see more)ehension systems that can do the same. In this paper, we introduce a task and several models to drive progress towards this goal. In particular, we propose the task of rare entity prediction: given a web document with several entities removed, models are tasked with predicting the correct missing entities conditioned on the document context and the lexical resources. This task is challenging due to the diversity of language styles and the extremely large number of rare entities. We propose two recurrent neural network architectures which make use of external knowledge in the form of entity descriptions. Our experiments show that our hierarchical LSTM model performs significantly better at the rare entity prediction task than those that do not make use of external resources.

2017-09-01

Conference on Empirical Methods in Natural Language Processing (published)

doi.org

End-to-end optimization of goal-driven and visually grounded dialogue systems

Florian Strub

Harm de Vries

Jérémie Mary

Bilal Piot

Aaron Courville

Olivier Pietquin

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architec… (see more)tures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature , making the context of a dialogue larger than the sole history. This is why only chitchat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues , based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues collected through Mechanical Turk and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.

2017-08-19

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (published)

doi.org

arxiv.org

Adversarial Generation of Natural Language

Sandeep Subramanian

Sai Rajeswar

Francis Dutil

Chris Pal

Aaron Courville

Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for … (see more)image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.

2017-08-01

Proceedings of the 2nd Workshop on Representation Learning for NLP (published)

doi.org

arxiv.org

Predicting Success in Goal-Driven Human-Human Dialogues

Michael Noseworthy

Jackie Cheung

Joelle Pineau

In goal-driven dialogue systems, success is often defined based on a structured definition of the goal. This requires that the dialogue syst… (see more)em be constrained to handle a specific class of goals and that there be a mechanism to measure success with respect to that goal. However, in many human-human dialogues the diversity of goals makes it infeasible to define success in such a way. To address this scenario, we consider the task of automatically predicting success in goal-driven human-human dialogues using only the information communicated between participants in the form of text. We build a dataset from stackoverflow.com which consists of exchanges between two users in the technical domain where ground-truth success labels are available. We then propose a turn-based hierarchical neural network model that can be used to predict success without requiring a structured goal definition. We show this model outperforms rule-based heuristics and other baselines as it is able to detect patterns over the course of a dialogue and capture notions such as gratitude.

2017-08-01

SIGDIAL Conference (published)

doi.org

A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images

David Vazquez

Jorge Bernal

F. Javier Sánchez

Gloria Fernández-Esparrach

Antonio M. López

Adriana Romero Soriano

Michal Drozdzal

Aaron Courville

Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to… (see more) perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss rate and the inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing decision support systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image segmentation, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. The proposed dataset consists of 4 relevant classes to inspect the endoluminal scene, targeting different clinical needs. Together with the dataset and taking advantage of advances in semantic segmentation literature, we provide new baselines by training standard fully convolutional networks (FCNs). We perform a comparative study to show that FCNs significantly outperform, without any further postprocessing, prior results in endoluminal scene segmentation, especially with respect to polyp segmentation and localization.

2017-07-26

Journal of Healthcare Engineering (published)

doi.org

arxiv.org

Self-organized Hierarchical Softmax

Yikang Shen

Shawn Tan

Chris Pal

Aaron Courville

We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies. Instead … (see more)of using a predefined hierarchical structure, our approach is capable of learning word clusters with clear syntactical and semantic meaning during the language model training process. We provide experiments on standard benchmarks for language modeling and sentence compression tasks. We find that this approach is as fast as other efficient softmax approximations, while achieving comparable or even better performance relative to similar full softmax models.

2017-07-26

ArXiv (preprint)

arxiv.org

Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support

M. Cardoso

Tal Arbel

G. Carneiro

T. Syeda-Mahmood

J. Tavares

Mehdi Moradi

Andrew P. Bradley

Hayit Greenspan

J. Papa

Anant. Madabhushi

Jacinto C Nascimento

Jaime S. Cardoso

Vasileios Belagiannis

Zhi Lu

Faculdade Engenharia

2017-07-19

ArXiv (preprint)

doi.org

arxiv.org

A Closer Look at Memorization in Deep Networks

Devansh Arpit

Stanisław Jastrzębski

Nicolas Ballas

Maxinder S. Kanwal

Asja Fischer

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While dee… (see more)p networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

2017-07-17

Proceedings of the 34th International Conference on Machine Learning (published)

proceedings.mlr.press

arxiv.org

Accelerated Stochastic Power Iteration

Peng Xu

Bryan Dawei He

Christopher De Sa

Ioannis Mitliagkas

Christopher Re

Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, … (see more)requires O ( 1 / Δ ) full-data passes to recover the principal component of a matrix with eigen-gap Δ. Lanczos, a significantly more complex method, achieves an accelerated rate of O ( 1 / Δ ) passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity O ( σ 2 / Δ 2 ) . Unfortunately, they are fully sequential, and also require O ( σ 2 / Δ 2 ) iterations, far from the O ( 1 / Δ ) rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, O ( 1 / Δ ) . We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity O ( 1 / Δ ) . Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.

2017-07-10

ArXiv (preprint)

arxiv.org

Learning Visual Reasoning Without Strong Priors

Ethan Perez

Harm de Vries

Florian Strub

Vincent Dumoulin

Aaron Courville

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an… (see more) important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.

2017-07-10

ArXiv (preprint)

arxiv.org

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Publications

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Popular keywords:

Publications