Publications

Multiresolution Recurrent Neural Networks: An Application to Dialogue\n Response Generation

Iulian Vlad Serban

Tim Klinger

Gerald Tesauro

Kartik Talamadupula

Bowen Zhou

We introduce the multiresolution recurrent neural network, which extends the\nsequence-to-sequence framework to model natural language gener… (voir plus)ation as two\nparallel discrete stochastic processes: a sequence of high-level coarse tokens,\nand a sequence of natural language tokens. There are many ways to estimate or\nlearn the high-level coarse tokens, but we argue that a simple extraction\nprocedure is sufficient to capture a wealth of high-level discourse semantics.\nSuch procedure allows training the multiresolution recurrent neural network by\nmaximizing the exact joint log-likelihood over both sequences. In contrast to\nthe standard log- likelihood objective w.r.t. natural language tokens (word\nperplexity), optimizing the joint log-likelihood biases the model towards\nmodeling high-level abstractions. We apply the proposed model to the task of\ndialogue response generation in two challenging domains: the Ubuntu technical\nsupport domain, and Twitter conversations. On Ubuntu, the model outperforms\ncompeting approaches by a substantial margin, achieving state-of-the-art\nresults according to both automatic evaluation metrics and a human evaluation\nstudy. On Twitter, the model appears to generate more relevant and on-topic\nresponses according to automatic evaluation metrics. Finally, our experiments\ndemonstrate that the proposed model is more adept at overcoming the sparsity of\nnatural language and is better able to capture long-term structure.\n

2017-02-11

AAAI Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

Real-Time Indoor Localization in Smart Homes Using Semi-Supervised Learning

Negar Ghourchian

Michel Allegue‐martínez

Doina Precup

Long-term automated monitoring of residential or small in- dustrial properties is an important task within the broader scope of human activi… (voir plus)ty recognition. We present a device- free wifi-based localization system for smart indoor spaces, developed in a collaboration between McGill University and Aerˆıal Technologies. The system relies on existing wifi net- work signals and semi-supervised learning, in order to au- tomatically detect entrance into a residential unit, and track the location of a moving subject within the sensing area. The implemented real-time monitoring platform works by detect- ing changes in the characteristics of the wifi signals collected via existing off-the-shelf wifi-enabled devices in the environ- ment. This platform has been deployed in several apartments in the Montreal area, and the results obtained show the poten- tial of this technology to turn any regular home with an ex- isting wifi network into a smart home equipped with intruder alarm and room-level location detector. The machine learn- ing component has been devised so as to minimize the need for user annotation and overcome temporal instabilities in the input signals. We use a semi-supervised learning framework which works in two phases. First, we build a base learner for mapping wifi signals to different physical locations in the en- vironment from a small amount of labeled data; during its lifetime, the learner automatically re-trains when the uncer- tainty level rises significantly, without the need for further supervision. This paper describes the technical and practical issues arising in the design and implementation of such a sys- tem for real residential units, and illustrates its performance during on-going deployment.

2017-02-10

AAAI Conference on Artificial Intelligence (publié)

doi.org

Adversarially Learned Inference

Vincent Dumoulin

Ishmael Belghazi

Ben Poole

We introduce the adversarially learned inference (ALI) model, which jointly learns a generation network and an inference network using an ad… (voir plus)versarial process. The generation network maps samples from stochastic latent variables to the data space while the inference network maps training examples in data space to the space of latent variables. An adversarial game is cast between these two networks and a discriminative network is trained to distinguish between joint latent/data-space samples from the generative network and joint samples from the inference network. We illustrate the ability of the model to learn mutually coherent inference and generation networks through the inspections of model samples and reconstructions and confirm the usefulness of the learned representations by obtaining a performance competitive with state-of-the-art on the semi-supervised SVHN and CIFAR10 tasks.

2017-02-05

International Conference on Learning Representations (poster)

doi.org

openreview.net

Calibrating Energy-based Generative Adversarial Networks

Zihang Dai

Amjad Almahairi

Philip Bachman

Eduard Hovy

Aaron Courville

In this paper, we propose to equip Generative Adversarial Networks with the ability to produce direct energy estimates for samples. Specific… (voir plus)ally, we propose a flexible adversarial training framework, and prove this framework not only ensures the generator converges to the true data distribution, but also enables the discriminator to retain the density information at the global optimal. We derive the analytic form of the induced solution, and analyze the properties. In order to make the proposed framework trainable in practice, we introduce two effective approximation techniques. Empirically, the experiment results closely match our theoretical analysis, verifying the discriminator is able to recover the energy of data distribution.

2017-02-05

ICLR.cc/2017/conference (poster)

openreview.net

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Jose Sotelo

In this paper we propose a novel model for unconditional audio generation task that generates one audio sample at a time. We show that our m… (voir plus)odel which profits from combining memory-less modules, namely autoregressive multilayer perceptron, and stateful recurrent neural networks in a hierarchical structure is de facto powerful to capture the underlying sources of variations in temporal domain for very long time on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.

2017-02-05

ICLR.cc/2017/conference (poster)

openreview.net

Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus

Iulian Vlad Serban

Chia-Wei Liu

In this paper, we construct and train end-to-end neural network-based dialogue systems usingan updated version of the recent Ubuntu Dialogue… (voir plus) Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu Dialogue Corpus, and for end-to-end dialogue systems in general.

2017-01-19

Dialogue & Discourse (publié)

doi.org

An Actor-Critic Algorithm for Sequence Prediction

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Curren… (voir plus)t log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \textit{critic} network that is trained to predict the value of an output token, given the policy of an \textit{actor} network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.

2016-12-31

ICLR.cc/2017/conference (poster)

doi.org

openreview.net

BOUNDS LEAD TO IMPROVED CLASSIFIERS

Nicolas Roux

The standard approach to supervised classification involves the minimization of a log-loss as an upper bound to the classification error. Wh… (voir plus)ile this is a tight bound early on in the optimization, it overemphasizes the influence of incorrectly classified examples far from the decision boundary. Updating the upper bound during the optimization leads to improved classification rates while transforming the learning into a sequence of minimization problems. In addition, in the context where the classifier is part of a larger system, this modification makes it possible to link the performance of the classifier to that of the whole system, allowing the seamless introduction of external constraints.

2016-12-31

(publié)

www.semanticscholar.org

Computer Assisted and Robotic Endoscopy and Clinical Image-Based Procedures

M. Jorge Cardoso

Tal Arbel

Xiongbiao Luo

Stefan Wesarg

Tobias Reichl

M. Ballester

Jonathan Mcleod

Klaus Dr. Drechsler

T. Peters

Marius Erdt

Kensaku Mori

M. Linguraru

Andreas Uhl

Cristina Oyarzun Laura

R. Shekhar

2016-12-31

Lecture Notes in Computer Science (publié)

doi.org

Computer-Assisted Conceptual Analysis of Textual Data as Applied to Philosophical Corpuses

Jean Guy Meunier

L. Chartrand

Jackie CK Cheung

Mathieu Valette

Marie-noëlle Bayle

2016-12-31

DH (publié)

dblp.uni-trier.de

Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support

M. Jorge Cardoso

Tal Arbel

G. Carneiro

T. Syeda-Mahmood

J. Tavares

Mehdi Moradi

Andrew P. Bradley

Hayit Greenspan

J. Papa

Anant. Madabhushi

Jacinto C Nascimento

Jaime S. Cardoso

Vasileios Belagiannis

Zhi Lu

Faculdade Engenharia

2016-12-31

Lecture Notes in Computer Science (publié)

doi.org

arxiv.org

Diet Networks: Thin Parameters for Fat Genomics

Adriana Romero

Marie-Pierre Dubé

Julie G. Hussin

Yoshua Bengio

Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (voir plus) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.

2016-12-31

ICLR.cc/2017/conference (poster)

doi.org

openreview.net

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Publications

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Publications