Publications

Building a Neural Semantic Parser from a Domain Ontology

Jianpeng Cheng

Mirella Lapata

Semantic parsing is the task of converting natural language utterances into machine interpretable meaning representations which can be execu… (see more)ted against a real-world environment such as a database. Scaling semantic parsing to arbitrary domains faces two interrelated challenges: obtaining broad coverage training data effectively and cheaply; and developing a model that generalizes to compositional utterances and complex intentions. We address these challenges with a framework which allows to elicit training data from a domain ontology and bootstrap a neural parser which recursively builds derivations of logical forms. In our framework meaning representations are described by sequences of natural language templates, where each template corresponds to a decomposed fragment of the underlying meaning representation. Although artificial, templates can be understood and paraphrased by humans to create natural utterances, resulting in parallel triples of utterances, meaning representations, and their decompositions. These allow us to train a neural semantic parser which learns to compose rules in deriving meaning representations. We crowdsource training data on six domains, covering both single-turn utterances which exhibit rich compositionality, and sequential utterances where a complex task is procedurally performed in steps. We then develop neural semantic parsers which perform such compositional tasks. In general, our approach allows to deploy neural semantic parsers quickly and cheaply from a given domain ontology.

2018-12-25

ArXiv (preprint)

arxiv.org

Clustering-Oriented Representation Learning with Attractive-Repulsive Loss

Kian Kenyon-Dean

Andre Cianflone

Lucas Caccia

Guillaume Rabusseau

Jackie Cheung

Doina Precup

The standard loss function used to train neural network classifiers, categorical cross-entropy (CCE), seeks to maximize accuracy on the trai… (see more)ning data; building useful representations is not a necessary byproduct of this objective. In this work, we propose clustering-oriented representation learning (COREL) as an alternative to CCE in the context of a generalized attractive-repulsive loss framework. COREL has the consequence of building latent representations that collectively exhibit the quality of natural clustering within the latent space of the final hidden layer, according to a predefined similarity function. Despite being simple to implement, COREL variants outperform or perform equivalently to CCE in a variety of scenarios, including image and news article classification using both feed-forward and convolutional neural networks. Analysis of the latent spaces created with different similarity functions facilitates insights on the different use cases COREL variants can satisfy, where the Cosine-COREL variant makes a consistently clusterable latent space, while Gaussian-COREL consistently obtains better classification accuracy than CCE.

2018-12-18

ArXiv (preprint)

arxiv.org

Learning Typed Entailment Graphs with Global Soft Constraints

Mohammad Javad Hosseini

Nathanael Chambers

Siva Reddy

Xavier R. Holt

Shay B. Cohen

Mark Johnson

Mark Steedman

This paper presents a new method for learning typed entailment graphs from text. We extract predicate-argument structures from multiple-sour… (see more)ce news corpora, and compute local distributional similarity scores to learn entailments between predicates with typed arguments (e.g., person contracted disease). Previous work has used transitivity constraints to improve local decisions, but these constraints are intractable on large graphs. We instead propose a scalable method that learns globally consistent similarity scores based on new soft constraints that consider both the structures across typed entailment graphs and inside each graph. Learning takes only a few hours to run over 100K predicates and our results show large improvements over local similarity scores on two entailment data sets. We further show improvements over paraphrases and entailments from the Paraphrase Database, and prior state-of-the-art entailment graphs. We show that the entailment graphs improve performance in a downstream task.

2018-12-01

Transactions of the Association for Computational Linguistics (published)

doi.org

Contextual Bandits for Adapting Treatment in a Mouse Model of de Novo Carcinogenesis

Audrey Durand

Charis Achilleos

Demetris C Iacovides

Katerina Strati

Georgios D. Mitsis

Joelle Pineau

In this work, we present a specific case study where we aim to design effective treatment allocation strategies and validate these using a m… (see more)ouse model of skin cancer. Collecting data for modelling treatments effectiveness on animal models is an expensive and time consuming process. Moreover, acquiring this information during the full range of disease stages is hard to achieve with a conventional random treatment allocation procedure, as poor treatments cause deterioration of subject health. We therefore aim to design an adaptive allocation strategy to improve the efficiency of data collection by allocating more samples for exploring promising treatments. We cast this application as a contextual bandit problem and introduce a simple and practical algorithm for exploration-exploitation in this framework. The work builds on a recent class of approaches for non-contextual bandits that relies on subsampling to compare treatment options using an equivalent amount of information. On the technical side, we extend the subsampling strategy to the case of bandits with context, by applying subsampling within Gaussian Process regression. On the experimental side, preliminary results using 10 mice with skin tumours suggest that the proposed approach extends by more than 50% the subjects life duration compared with baseline strategies: no treatment, random treatment allocation, and constant chemotherapeutic agent. By slowing the tumour growth rate, the adaptive procedure gathers information about treatment effectiveness on a broader range of tumour volumes, which is crucial for eventually deriving sequential pharmacological treatment strategies for cancer.

2018-11-29

Proceedings of the 3rd Machine Learning for Healthcare Conference (published)

proceedings.mlr.press

Understanding the impact of entropy in policy learning

Zafarali Ahmed

Nicolas Le Roux

Mohammad Norouzi

Dale Schuurmans

Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{explorat… (see more)ion} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. Then, we qualitatively show that in some environments, a policy with higher entropy can make the optimization landscape smoother, thereby connecting local optima and enabling the use of larger learning rates. This paper presents new tools for understanding the optimization landscape, shows that policy entropy serves as a regularizer, and highlights the challenge of designing general-purpose policy optimization algorithms.

2018-11-27

(published)

www.semanticscholar.org

Environments for Lifelong Reinforcement Learning

Khimya Khetarpal

Shagun Sodhani

Sarath Chandar Anbil Parthipan

Doina Precup

To achieve general artificial intelligence, reinforcement learning (RL) agents should learn not only to optimize returns for one specific ta… (see more)sk but also to constantly build more complex skills and scaffold their knowledge about the world, without forgetting what has already been learned. In this paper, we discuss the desired characteristics of environments that can support the training and evaluation of lifelong reinforcement learning agents, review existing environments from this perspective, and propose recommendations for devising suitable environments in the future.

2018-11-26

ArXiv (preprint)

arxiv.org

Multi-task Learning over Graph Structures

Pengfei Liu

Jie Fu

Yue Dong

Xipeng Qiu

Jackie Cheung

We present two architectures for multi-task learning with neural sequence models. Our approach allows the relationships between different ta… (see more)sks to be learned dynamically, rather than using an ad-hoc pre-defined structure as in previous work. We adopt the idea from message-passing graph neural networks and propose a general \textbf{graph multi-task learning} framework in which different tasks can communicate with each other in an effective and interpretable way. We conduct extensive experiments in text classification and sequence labeling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines but also learn interpretable and transferable patterns across tasks.

2018-11-26

ArXiv (preprint)

arxiv.org

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Spyridon Bakas

Mauricio Reyes

Andras Jakab

Stefan. Bauer

Markus Rempfler

Alessandro Crimi

Russell T. Shinohara

Christoph Berger

Sung-min Ha

Martin Rozycki

Marcel W. Prastawa

Esther Alberts

Jana Lipková

John Freymann

Justin Kirby

Michel Bilello

Hassan M. Fathallah-Shaykh

Roland Wiest

J. Kirschke

Benedikt Wiestler … (see 31 more)

Rivka R. Colen

Aikaterini Kotrotsou

Pamela LaMontagne

D. Marcus

Mikhail Milchenko

Arash Nazeri

Marc-André Weber

Abhishek Mahajan

Ujjwal Baid

Dongjin Kwon

Manu Agarwal

Mahbubul Alam

Alberto Albiol

A. Albiol

Alex A. Varghese

T. Tuan

Tal Arbel

Aaron J. Avery

Bobade Pranjal

Subhashis Banerjee

Thomas H. Batchelder

Nematollah Batmanghelich

Enzo Battistella

Martin Bendszus

E. Benson

José Bernal

George Biros

Mariano Cabezas

Siddhartha Chandra

Yi-Ju Chang

et al.

Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneo… (see more)us histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumoris a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses thestate-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross tota lresection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.

2018-11-05

ArXiv (preprint)

doi.org

arxiv.org

On the Evaluation of Common-Sense Reasoning in Natural Language Understanding

Paul Trichelair

Ali Emami

Jackie Cheung

Adam Trischler

Kaheer Suleman

Fernando Diaz

The NLP and ML communities have long been interested in developing models capable of common-sense reasoning, and recent works have significa… (see more)ntly improved the state of the art on benchmarks like the Winograd Schema Challenge (WSC). Despite these advances, the complexity of tasks designed to test common-sense reasoning remains under-analyzed. In this paper, we make a case study of the Winograd Schema Challenge and, based on two new measures of instance-level complexity, design a protocol that both clarifies and qualifies the results of previous work. Our protocol accounts for the WSC's limited size and variable instance difficulty, properties common to other common-sense benchmarks. Accounting for these properties when assessing model results may prevent unjustified conclusions.

2018-11-05

arXiv.org (preprint)

dblp.uni-trier.de

The Hard-CoRe Coreference Corpus: Removing Gender and Number Cues for Difficult Pronominal Anaphora Resolution

Ali Emami

Paul Trichelair

Adam Trischler

Kaheer Suleman

Hannes Schulz

Jackie Cheung

We introduce a new benchmark task for coreference resolution, Hard-CoRe, that targets common-sense reasoning and world knowledge. Previous c… (see more)oreference resolution tasks have been overly vulnerable to systems that simply exploit the number and gender of the antecedents, or have been handcrafted and do not reflect the diversity of sentences in naturally occurring text. With these limitations in mind, we present a resolution task that is both challenging and realistic. We demonstrate that various coreference systems, whether rule-based, feature-rich, graphical, or neural-based, perform at random or slightly above-random on the task, whereas human performance is very strong with high inter-annotator agreement. To explain this performance gap, we show empirically that state-of-the art models often fail to capture context and rely only on the antecedents to make a decision.

2018-11-02

ArXiv (preprint)

arxiv.org

The KnowRef Coreference Corpus: Removing Gender and Number Cues for Difficult Pronominal Anaphora Resolution

Ali Emami

Paul Trichelair

Adam Trischler

Kaheer Suleman

Hannes Schulz

Jackie Cheung

We introduce a new benchmark for coreference resolution and NLI, KnowRef, that targets common-sense understanding and world knowledge. Previ… (see more)ous coreference resolution tasks can largely be solved by exploiting the number and gender of the antecedents, or have been handcrafted and do not reflect the diversity of naturally occurring text. We present a corpus of over 8,000 annotated text passages with ambiguous pronominal anaphora. These instances are both challenging and realistic. We show that various coreference systems, whether rule-based, feature-rich, or neural, perform significantly worse on the task than humans, who display high inter-annotator agreement. To explain this performance gap, we show empirically that state-of-the art models often fail to capture context, instead relying on the gender or number of candidate antecedents to make a decision. We then use problem-specific insights to propose a data-augmentation trick called antecedent switching to alleviate this tendency in models. Finally, we show that antecedent switching yields promising results on other tasks as well: we use it to achieve state-of-the-art results on the GAP coreference task.

2018-11-02

Annual Meeting of the Association for Computational Linguistics (published)

doi.org

Automatic differentiation in ML: Where we are and where we should be going

Bart van Merriënboer

Olivier Breuleux

Arnaud Bergeron

Pascal Lamblin

We review the current state of automatic differentiation (AD) for array programming in machine learning (ML), including the different approa… (see more)ches such as operator overloading (OO) and source transformation (ST) used for AD, graph-based intermediate representations for programs, and source languages. Based on these insights, we introduce a new graph-based intermediate representation (IR) which specifically aims to efficiently support fully-general AD for array programming. Unlike existing dataflow programming representations in ML frameworks, our IR naturally supports function calls, higher-order functions and recursion, making ML models easier to implement. The ability to represent closures allows us to perform AD using ST without a tape, making the resulting derivative (adjoint) program amenable to ahead-of-time optimization using tools from functional language compilers, and enabling higher-order derivatives. Lastly, we introduce a proof of concept compiler toolchain called Myia which uses a subset of Python as a front end.

2018-10-01

ArXiv (preprint)

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications